Spider Middlewares

kingfisher_scrapy.spidermiddlewares.sample_filled(spider, number)[source]
kingfisher_scrapy.spidermiddlewares.group_size(spider)[source]
kingfisher_scrapy.spidermiddlewares.read_data_from_file_if_any(item)[source]
class kingfisher_scrapy.spidermiddlewares.ConcatenatedJSONMiddleware[source]

If the spider’s concatenated_json class attribute is True, yields each object of the File as a FileItem. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of FileItem objects, in which the data field is parsed JSON

class kingfisher_scrapy.spidermiddlewares.LineDelimitedMiddleware[source]

If the spider’s line_delimited class attribute is True, yields each line of the File as a FileItem. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of FileItem objects, in which the data field is bytes

class kingfisher_scrapy.spidermiddlewares.ValidateJSONMiddleware[source]

If the spider’s validate_json class attribute is True, checks if the item’s data field is valid JSON. If not, yields nothing. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of File or FileItem objects, in which the data field is valid JSON

class kingfisher_scrapy.spidermiddlewares.RootPathMiddleware[source]

If the spider’s root_path class attribute is non-empty, replaces the item’s data with the objects at that prefix; if there are multiple releases, records or packages at that prefix, combines them into packages in groups of 100, and updates the item’s data_type if needed. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of File or FileItem objects, in which the data field is parsed JSON

class kingfisher_scrapy.spidermiddlewares.AddPackageMiddleware[source]

If the spider’s data_type class attribute is “release” or “record”, wraps the item’s data in an appropriate package, and updates the item’s data_type. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of File or FileItem objects, in which the data field is parsed JSON

class kingfisher_scrapy.spidermiddlewares.ResizePackageMiddleware[source]

If the spider’s resize_package class attribute is True, splits the package into packages of 100 releases or records each. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]

The spider must yield items whose data field has package and data keys.

Returns:

a generator of FileItem objects, in which the data field is a string

class kingfisher_scrapy.spidermiddlewares.ReadDataMiddleware[source]

If the item’s data is a file descriptor, replaces the item’s data with the file’s contents and closes the file descriptor. Otherwise, yields the original item.

async process_spider_output(response, result, spider)[source]
Returns:

a generator of File objects, in which the data field is bytes

class kingfisher_scrapy.spidermiddlewares.RetryDataErrorMiddleware[source]

Retries a request up to 3 times. Either when the spider raises a BadZipFile exception, on the assumption that the response was truncated, or when the spider raises a RetryableError exception.

process_spider_exception(response, exception, spider)[source]