Integrate with Kingfisher Process¶
See also
Collect data incrementally, about the
keep_collection_openspider argumentBaseSpider, about theocds_versionclass attribute
Kingfisher Collect has optional integration with Kingfisher Process, through the KingfisherProcessAPI2 extension.
After deploying and starting an instance of Kingfisher Process, set the following either as environment variables or as Scrapy settings in kingfisher_scrapy.settings.py:
KINGFISHER_API2_URLThe URL of Kingfisher Process’ web API, for example:
http://user:pass@localhost:8000RABBIT_URLThe URL of the RabbitMQ message broker, for example:
amqp://user:pass@localhost:5672RABBIT_EXCHANGE_NAMEThe name of the exchange in RabbitMQ, for example:
kingfisher_process_developmentRABBIT_ROUTING_KEYThe routing key for messages sent to RabbitMQ, equal to the exchange name with an
_apisuffix, for example:kingfisher_process_development_api
Add a note to the collection¶
Add a note to the collection_note table in Kingfisher Process. For example, to track provenance:
scrapy crawl spider_name -a note='Started by NAME.'
Select which processing steps to run¶
Kingfisher Process stores OCDS data, and upgrades it if the spider sets a class attribute of ocds_version = '1.0'. It can also perform the optional steps below.
- Run structural checks and create compiled releases
scrapy crawl spider_name -a steps=check,compile
- Run structural checks only
scrapy crawl spider_name -a steps=check
- Create compiled releases only
scrapy crawl spider_name -a steps=compile
- Do neither
scrapy crawl spider_name -a steps=