Links Spider¶

class kingfisher_scrapy.base_spiders.links_spider.LinksSpider(*args, **kwargs)[source]¶

This class makes it easy to collect data from an API that implements the pagination pattern:

Inherit from LinksSpider
Set a data_type class attribute to the data type of the API responses
Set a formatter class attribute to set the file name like in build_request()
Write a start_requests() method to request the first page of API results
Optionally, set a next_pointer class attribute to the JSON Pointer for the next link (default “/links/next”)

If the API returns the number of total pages or results in the response, consider using IndexSpider instead.

import scrapy

from kingfisher_scrapy.base_spiders import LinksSpider

class MySpider(LinksSpider):
    name = 'my_spider'

    # SimpleSpider
    data_type = 'release_package'

    # LinksSpider
    formatter = staticmethod(parameters('page'))

    def start_requests(self):
        yield scrapy.Request('https://example.com/api/packages.json', meta={'file_name': 'page-1.json'})

Parameters:

args (Any)
kwargs (Any)

Return type:

Self

next_pointer = '/links/next'¶

parse(response)[source]¶

next_link(response, **kwargs)[source]¶: If the JSON response has a links.next key, returns a scrapy.Request for the URL.