Periodic Spider¶
- class kingfisher_scrapy.base_spiders.periodic_spider.PeriodicSpider(*args, **kwargs)[source]¶
Collect data from an API that accepts a year, year-month, date or datetime as a query string parameter or URL path component.
Inherit from
PeriodicSpiderSet a
date_formatclass attribute to “year”, “year-month”, “date” or “datetime”Set a
patternclass attribute to a URL pattern, with placeholders. If thedate_formatis “year”, then a year is passed to the placeholder as anint. If thedate_formatis “year-month”, then the first day of the month is passed to the placeholder as adate, which you can format as, for example:Set a
formatterclass attribute to set the file name like inbuild_request()Set a
default_from_dateclass attribute to a year (“YYYY”) or year-month (“YYYY-MM”)If the source stopped publishing, set a
default_until_dateclass attribute to a year or year-monthOptionally, if the
date_formatis “date”, set astepclass attribute to indicate the length of intervals, in days - otherwise, it defaults to 1Optionally, set a
start_callbackclass attribute to a method’s name as a string - otherwise, it defaults toparse()
If
sampleis set, the data from the most recent year or month is retrieved.- date_required = True¶
- step = 1¶
- start_callback = 'parse'¶
- async start()[source]¶
Yield the initial
Requestobjects to send.Added in version 2.13.
For example:
from scrapy import Request, Spider class MySpider(Spider): name = "myspider" async def start(self): yield Request("https://toscrape.com/")
The default implementation reads URLs from
start_urlsand yields a request for each withdont_filterenabled. It is functionally equivalent to:async def start(self): for url in self.start_urls: yield Request(url, dont_filter=True)
You can also yield items. For example:
async def start(self): yield {"foo": "bar"}
To write spiders that work on Scrapy versions lower than 2.13, define also a synchronous
start_requests()method that returns an iterable. For example:def start_requests(self): yield Request("https://toscrape.com/")
See also