Kingfisher Process API

class kingfisher_scrapy.extensions.kingfisher_process_api2.Client(**kwargs)[source]
exchange_ready()[source]

Override this method in subclasses, which is called once the exchange is declared.

reset()[source]

Reset the client’s state, before reconnecting.

Override this method in subclasses, if your subclass adds and mutates any attributes.

class kingfisher_scrapy.extensions.kingfisher_process_api2.KingfisherProcessAPI2(url, stats, rabbit_url, rabbit_exchange_name, rabbit_routing_key)[source]

If the KINGFISHER_API2_URL, RABBIT_URL, RABBIT_EXCHANGE_NAME and RABBIT_ROUTING_KEY environment variables or configuration settings are set, then OCDS data is stored in Kingfisher Process, incrementally.

When the spider is opened, a collection is created in Kingfisher Process via its web API. The API also receives the note and steps spider arguments (if set) and the spider’s ocds_version class attribute.

When an item is scraped, a message is published to the exchange for Kingfisher Process in RabbitMQ, with the path to the file written by the FilesStore extension.

When the spider is closed, the collection is closed in Kingfisher Process via its web API, unless the keep_collection_open spider argument was set to 'true'. The API also receives the crawl statistics and the reason why the spider was closed.

Note

If the DATABASE_URL environment variable or configuration setting is set, this extension is disabled and the DatabaseStore extension is enabled.

Note

This extension ignores items generated by the pluck command.

classmethod from_crawler(crawler)[source]
spider_opened(spider)[source]

Sends an API request to create a collection in Kingfisher Process.

spider_closed(spider, reason)[source]

Sends an API request to close the collection in Kingfisher Process.

item_scraped(item, spider)[source]

Publishes a RabbitMQ message to store the file, file item or file error in Kingfisher Process.

disconnect_and_join()[source]

Closes the RabbitMQ connection and joins the client’s thread.