Big File Spider¶
- class kingfisher_scrapy.base_spiders.big_file_spider.BigFileSpider(*args, **kwargs)[source]¶
Collect data from a source that publishes very large packages.
Each package is split into smaller packages, each containing 100 releases or records. Users can then process the files without using an iterative parser and without having memory issues.
Inherit from
BigFileSpider
Write a
start_requests()
method to request the archive files
from kingfisher_scrapy.base_spiders import BigFileSpider from kingfisher_scrapy.util import components class MySpider(BigFileSpider): name = 'my_spider' def start_requests(self): yield self.build_request('https://example.com/api/package.json', formatter=components(-1)
Note
concatenated_json = True
,line_delimited = True
androot_path
are not supported, because this spider yields items whosedata
field haspackage
anddata
keys.- Parameters:
args (Any)
kwargs (Any)
- Return type:
Self
- resize_package = True¶