Links Spider¶
- class kingfisher_scrapy.base_spiders.links_spider.LinksSpider(*args, **kwargs)[source]¶
Collect data from an API that implements the pagination pattern.
Inherit from
LinksSpiderSet a
data_typeclass attribute to the data type of the API responsesSet a
formatterclass attribute to set the file name like inbuild_request()Set a
next_link_formatterclass attribute if pagination URLs differ from start URLsWrite a
start()method to request the first page of API resultsOptionally, set a
next_pointerclass attribute to the JSON Pointer for the next link (default “/links/next”)
If the API returns the number of total pages or results in the response, consider using
IndexSpiderinstead.import scrapy from kingfisher_scrapy.base_spiders import LinksSpider class MySpider(LinksSpider): name = 'my_spider' # SimpleSpider data_type = 'release_package' # LinksSpider formatter = staticmethod(parameters('page')) async def start(self): yield scrapy.Request('https://example.com/api/packages.json', meta={'file_name': 'page-1.json'})
- next_pointer = '/links/next'¶