Downloader Middlewares#
- class kingfisher_scrapy.downloadermiddlewares.ParaguayAuthMiddleware(spider)[source]#
Downloader middleware that manages API authentication for Paraguay scrapers.
Both DNCP (procurement authority) and Hacienda (finance ministry) use an authentication protocol based on OAuth 2.
This middleware helps us to manage the protocol, which consists on acquiring an access token every x minutes (usually 15) and sending the token on each request. The acquisition method of the token is delegated to the spider, since each publisher has their own credentials and requirements.
Apparently, a Downloader Middleware is the best place to set HTTP Request Headers (see https://docs.scrapy.org/en/latest/topics/architecture.html), but it’s not enough for this case :(. Tokens should be generated and assigned just before sending a request, but Scrapy does not provide any way to do this, which in turn means that sometimes we accidentally send expired tokens. For now, the issue seems to be avoided by setting the number of concurrent requests to 1, at cost of download speed.
class Paraguay: name = 'paraguay' # ParaguayAuthMiddleware access_token = None access_token_scheduled_at = None # The maximum age is less than the API's limit, since we don't precisely control Scrapy's scheduler. access_token_maximum_age = 14 * 60 access_token_request_failed = False requests_backlog = [] def build_access_token_request(self): self.access_token_scheduled_at = datetime.now() return scrapy.Request("https://example.com")
- class kingfisher_scrapy.downloadermiddlewares.OpenOppsAuthMiddleware[source]#
Downloader middleware that intercepts requests and adds the token for OpenOpps scraper.