Contents Menu Expand Light mode Dark mode Auto light/dark mode
New! Use the OCP Data Registry to download OCDS data, worldwide.
OCDS Kingfisher Collect documentation
OCDS Kingfisher Collect documentation

Contents

  • Download data to your computer
  • Download data to a remote server
  • Spiders
  • Log files
  • Command-line interface
  • Integrate with Kingfisher Process
  • Contributing
    • Base Spider Classes
      • Base Spider
      • Compressed File Spider
      • Simple Spider
      • Big File Spider
      • Index Spider
      • Links Spider
      • Periodic Spider
    • Downloader Middlewares
    • Spider Middlewares
    • Item Pipelines
    • Extensions
      • Sentry Logging
      • Pluck
      • Files Store
      • Kingfisher Process API (v2)
      • Database Store
      • Item Count
    • Utilities
    • Exceptions
  • Lapsed spiders
  v: latest
Versions
latest
Downloads
On Read the Docs
Project Home
Builds
Back to top
Edit this page

Links Spider#

class kingfisher_scrapy.base_spiders.links_spider.LinksSpider(*args, **kwargs)[source]#

This class makes it easy to collect data from an API that implements the pagination pattern:

  1. Inherit from LinksSpider

  2. Set a data_type class attribute to the data type of the API responses

  3. Set a formatter class attribute to set the file name like in build_request()

  4. Write a start_requests() method to request the first page of API results

  5. Optionally, set a next_pointer class attribute to the JSON Pointer for the next link (default “/links/next”)

If the API returns the number of total pages or results in the response, consider using IndexSpider instead.

import scrapy

from kingfisher_scrapy.base_spiders import LinksSpider

class MySpider(LinksSpider):
    name = 'my_spider'

    # SimpleSpider
    data_type = 'release_package'

    # LinksSpider
    formatter = staticmethod(parameters('page'))

    def start_requests(self):
        yield scrapy.Request('https://example.com/api/packages.json', meta={'file_name': 'page-1.json'})
next_pointer = '/links/next'#
parse(response)[source]#
next_link(response, **kwargs)[source]#

If the JSON response has a links.next key, returns a scrapy.Request for the URL.

Next
Periodic Spider
Previous
Index Spider
Copyright © 2019, Open Contracting Partnership
Made with Sphinx and @pradyunsg's Furo
On this page
  • Links Spider
    • LinksSpider
      • LinksSpider.next_pointer
      • LinksSpider.parse()
      • LinksSpider.next_link()