Database Store#
- class kingfisher_scrapy.extensions.database_store.DatabaseStore(database_url, files_store_directory)[source]#
If the
DATABASE_URL
Scrapy setting and thecrawl_time
spider argument are set, the OCDS data is stored in a PostgreSQL database, incrementally.This extension stores data in the “data” column of a table named after the spider, or the
table_name
spider argument (if set). When the spider is opened, if the table doesn’t exist, it is created. The spider’sfrom_date
attribute is then set, in order of precedence, to: thefrom_date
spider argument (unless equal to the spider’sdefault_from_date
class attribute); the maximum value of thedate
field of the stored data (if any); the spider’sdefault_from_date
class attribute (if set).When the spider is closed, this extension reads the data written by the FilesStore extension to the crawl directory that matches the
crawl_time
spider argument. If thecompile_releases
spider argument is set, it creates compiled releases, using individual releases. Then, it recreates the table, and inserts either the compiled releases if thecompile_releases
spider argument is set, the individual releases in release packages (if the spider returns releases), or the compiled releases in record packages (if the spider returns records).Warning
If the
compile_releases
spider argument is set, spiders that return records without embedded releases are not supported. If it isn’t set, then spiders that return records without compiled releases are not supported.To perform incremental updates, the OCDS data in the crawl directory must not be deleted between crawls.