Links Spider#
- class kingfisher_scrapy.base_spiders.links_spider.LinksSpider(*args, **kwargs)[source]#
This class makes it easy to collect data from an API that implements the pagination pattern:
Inherit from
LinksSpider
Set a
data_type
class attribute to the data type of the API responsesSet a
formatter
class attribute to set the file name like inbuild_request()
Write a
start_requests()
method to request the first page of API resultsOptionally, set a
next_pointer
class attribute to the JSON Pointer for the next link (default “/links/next”)
If the API returns the number of total pages or results in the response, consider using
IndexSpider
instead.import scrapy from kingfisher_scrapy.base_spiders import LinksSpider class MySpider(LinksSpider): name = 'my_spider' # SimpleSpider data_type = 'release_package' # LinksSpider formatter = staticmethod(parameters('page')) def start_requests(self): yield scrapy.Request('https://example.com/api/packages.json', meta={'file_name': 'page-1.json'})
- next_pointer = '/links/next'#