Integrate with Kingfisher Process#

Besides storing the scraped data on disk, you can also send them to an instance of Kingfisher Process for processing.

Version 1#

You need to deploy an instance of Kingfisher Process, including its web app. Then, set the following either as environment variables or as Scrapy settings in kingfisher_scrapy.settings.py:

KINGFISHER_API_URI

The URL from which Kingfisher Process’ web app is served. Do not include a trailing slash.

KINGFISHER_API_KEY

One of the API keys in Kingfisher Process’ API_KEYS setting.

To run a spider:

env KINGFISHER_API_URI='http://127.0.0.1:5000' KINGFISHER_API_KEY=1234 scrapy crawl spider_name

To add a note to the collection in Kingfisher Process:

scrapy crawl spider_name -a note='Started by NAME.'