Database Store¶
- class kingfisher_scrapy.extensions.database_store.DatabaseStore(database_url, files_store_directory)[source]¶
If the
DATABASE_URLScrapy setting and thecrawl_timespider argument are set, the OCDS data is stored in a PostgreSQL database, incrementally.This extension stores data in the “data” column of a table named after the spider, or the
table_namespider argument (if set). When the spider is opened, if the table doesn’t exist, it is created. The spider’sfrom_dateattribute is then set, in order of precedence, to: thefrom_datespider argument (unless equal to the spider’sdefault_from_dateclass attribute); the maximum value of thedatefield of the stored data (if any); the spider’sdefault_from_dateclass attribute (if set).When the spider is closed, this extension reads the data written by the FilesStore extension to the crawl directory that matches the
crawl_timespider argument. If thecompile_releasesspider argument is set, it creates compiled releases, using individual releases. Then, it recreates the table, and inserts either the compiled releases if thecompile_releasesspider argument is set, the individual releases in release packages (if the spider returns releases), or the compiled releases in record packages (if the spider returns records).Warning
If the
compile_releasesspider argument is set, spiders that return records without embedded releases are not supported. If it isn’t set, then spiders that return records without compiled releases are not supported.To perform incremental updates, the OCDS data in the crawl directory must not be deleted between crawls.