Spider Middlewares¶
- class kingfisher_scrapy.spidermiddlewares.BaseSpiderMiddleware(crawler)[source]¶
Base class for spider middlewares that need access to the spider instance.
- class kingfisher_scrapy.spidermiddlewares.ConcatenatedJSONMiddleware(crawler)[source]¶
If the spider’s
concatenated_jsonclass attribute isTrue, yield each object of the File as a FileItem. Otherwise, yield the original item.
- class kingfisher_scrapy.spidermiddlewares.LineDelimitedMiddleware(crawler)[source]¶
If the spider’s
line_delimitedclass attribute isTrue, yield each line of the File as a FileItem. Otherwise, yield the original item.
- class kingfisher_scrapy.spidermiddlewares.ValidateJSONMiddleware(crawler)[source]¶
If the spider’s
validate_jsonclass attribute isTrue, check if the item’sdatafield is valid JSON. If not, yield nothing. Otherwise, yield the original item.
- class kingfisher_scrapy.spidermiddlewares.RootPathMiddleware(crawler)[source]¶
If the spider’s
root_pathclass attribute is non-empty, replace the item’sdatawith the objects at that prefix; if there are multiple releases, records or packages at that prefix, combine them into packages in groups of 100, and update the item’sdata_typeif needed. Otherwise, yield the original item.
- class kingfisher_scrapy.spidermiddlewares.AddPackageMiddleware(crawler)[source]¶
If the spider’s
data_typeclass attribute is “release” or “record”, wrap the item’sdatain an appropriate package, and update the item’sdata_type. Otherwise, yield the original item.
- class kingfisher_scrapy.spidermiddlewares.ResizePackageMiddleware(crawler)[source]¶
If the spider’s
resize_packageclass attribute isTrue, split the package into packages of 100 releases or records each. Otherwise, yield the original item.Optionally, implement an
ocid_fallbackmethod on the spider, which accepts a release (or record) and returns an anocidvalue, to be used if theocidfield is not set.
- class kingfisher_scrapy.spidermiddlewares.ReadDataMiddleware(crawler)[source]¶
If the item’s
datais a file descriptor, replace the item’sdatawith the file’s contents and close the file descriptor. Otherwise, yield the original item.See also
- class kingfisher_scrapy.spidermiddlewares.HttpErrorMiddleware(crawler)[source]¶
Handle HTTP errors raised by Scrapy’s HttpErrorMiddleware.
If
is_http_retryable()returnsTrueand the number of attempts is less than the spider’smax_attemptsclass attribute, retries the request, after waiting the number of seconds returned byget_retry_wait_time().Otherwise, logs an error message.