Utilities

kingfisher_scrapy.util.pluck_filename(opts)[source]
kingfisher_scrapy.util.replace_path_separator(string)[source]
kingfisher_scrapy.util.components(start, stop=None)[source]

Return a function that returns the selected non-empty path components, excluding the .json extension.

>>> components(-1)('http://example.com/api/planning.json')
'planning'
>>> components(-2, -1)('http://example.com/api/planning/package.json')
'planning'
kingfisher_scrapy.util.parameters(*keys, parser=None)[source]

Return a function that returns the selected query string parameters.

>>> parameters('page')('http://example.com/api/packages.json?page=1')
'page-1'
>>> parameters('year', 'page')('http://example.com/api/packages.json?year=2000&page=1')
'year-2000-page-1'
kingfisher_scrapy.util.join(*functions, extension=None)[source]

Return a function that joins the given functions’ outputs and sets the file extension, if provided.

>>> join(components(-1), parameters('page'))('http://example.com/api/planning.json?page=1')
'planning-page-1'
kingfisher_scrapy.util.date_range_by_interval(start, stop, step)[source]

Yield date ranges from the start date to the stop date, in intervals of step days, in reverse chronological order.

kingfisher_scrapy.util.date_range_by_month(start, stop)[source]

Yield the first day of the month as a date from the start to the stop dates, in reverse chronological order.

kingfisher_scrapy.util.date_range_by_year(start, stop)[source]

Return the year as an int from the start to the stop years, in reverse chronological order.

kingfisher_scrapy.util.get_parameter_value(url, key)[source]

Return the first value of the query string parameter.

kingfisher_scrapy.util.replace_parameters(url, **kwargs)[source]

Return a URL after updating the query string parameters’ values.

kingfisher_scrapy.util.append_path_components(url, path)[source]

Return a URL after appending path components to its path.

kingfisher_scrapy.util.add_query_string(method, params)[source]

Return a function that yields the requests yielded by the wrapped method, after updating the query string parameter values in each request’s URL.

kingfisher_scrapy.util.add_path_components(method, path)[source]

Return a function that yields the requests yielded by the wrapped method, after appending path components to each request’s URL.

kingfisher_scrapy.util.items_basecoro(target, prefix, map_type=None, skip_key=None)[source]

Replicate the same function from ijson/common.py.

A skip_key argument is added. If the skip_key is in the current path, the current event is skipped. Otherwise, the method is identical.

kingfisher_scrapy.util.items(events, prefix, map_type=None, skip_key=None)[source]

Replicate the same function from ijson/common.py.

A skip_key argument is added, which is passed as a keyword argument to items_basecoro(). Otherwise, the method is identical.

kingfisher_scrapy.util.default(obj)[source]

Convert decimals to floats and iterables to lists.

class kingfisher_scrapy.util.TranscodeFile(file, encoding)[source]
read(buf_size)[source]

Re-encodes bytes read from the file to UTF-8.

kingfisher_scrapy.util.transcode_bytes(data, encoding)[source]

Re-encodes bytes to UTF-8.

kingfisher_scrapy.util.transcode(spider, function, data, *args, **kwargs)[source]
kingfisher_scrapy.util.grouper(iterable, n, fillvalue=None)[source]
kingfisher_scrapy.util.get_file_name_and_extension(filename)[source]

Given a filename, return its name and extension in two separate strings.

>>> get_file_name_and_extension('test.json')
('test', 'json')