Periodic Spider¶

class kingfisher_scrapy.base_spiders.periodic_spider.PeriodicSpider(*args, **kwargs)[source]¶

This class makes it easy to collect data from an API that accepts a year, year-month or date as a query string parameter or URL path component.

Inherit from PeriodicSpider
Set a date_format class attribute to “year”, “year-month” or “date”
Set a pattern class attribute to a URL pattern, with placeholders. If the date_format is “year”, then a year is passed to the placeholder as an int. If the date_format is “year-month”, then the first day of the month is passed to the placeholder as a date, which you can format as, for example:
Set a formatter class attribute to set the file name like in build_request()
Set a default_from_date class attribute to a year (“YYYY”) or year-month (“YYYY-MM”)
If the source stopped publishing, set a default_until_date class attribute to a year or year-month
Optionally, if the date_format is “date”, set a step class attribute to indicate the length of intervals, in days – otherwise, it defaults to 1
Optionally, set a start_requests_callback class attribute to a method’s name as a string - otherwise, it defaults to parse()

If sample is set, the data from the most recent year or month is retrieved.

build_urls(from_date, until_date=None)[source]¶: Yields one or more URLs for the given date.