OCDS Kingfisher Collect¶
Kingfisher Collect is a tool for downloading OCDS data and storing it on disk and/or sending it to an instance of Kingfisher Process for processing.
(If you are viewing this on GitHub, open the full documentation for additional details.)
You can:
- Download data to your computer, by installing Kingfisher Collect
- Download data to a remote server, by using Scrapyd
- Integrate with Kingfisher Process
Instead of installing Kingfisher Collect to your computer, you can follow this interactive step-by-step guide, to use Kingfisher Collect in Google Colaboratory.
You can also try using Kingfisher Collect with Scrapy Cloud.
How it works¶
Kingfisher Collect is built on the Scrapy framework. Using this framework, we have authored “spiders” that you can run in order to “crawl” data sources and extract OCDS data.
When collecting data from a data source, each of its OCDS files will be written to a separate file on your computer. (Depending on the data source, an OCDS file might be a record package, release package, individual record or individual release.)
By default, these files are written to a data
directory (you can change this) within your kingfisher-collect
directory (which you will create during installation). Each spider creates its own directory within the data
directory, and each crawl of a given spider creates its own directory within its spider’s directory. For example, if you run the zambia
spider (learn how), then the directory hierarchy will look like:
kingfisher-collect/
└── data
└── zambia
└── 20200102_030405
├── <...>.json
└── <...>
As you can see, the data
directory contains a zambia
spider directory (matching the spider’s name), which in turn contains a 20200102_030405
crawl directory (matching the time at which you started the crawl – in this case, 2020-01-02 03:04:05). The crawl directory contains .json
files – the OCDS data.
Contents
- Download data to your computer
- Download data to a remote server
- Spiders
- Spider metadata
- Spider arguments
- Afghanistan
- Argentina
- Armenia
- Australia
- Bolivia
- Canada
- Chile
- Colombia
- Costa Rica
- Digiwhist
- Dominican Republic
- Ecuador
- France
- Georgia
- Honduras
- India
- Indonesia
- Italy
- Kenya
- Kosovo
- Kyrgyzstan
- Malta
- Mexico
- Moldova
- Nepal
- Nicaragua
- Nigeria
- Openopps
- Pakistan
- Paraguay
- Portugal
- Scotland
- Spain
- Tanzania
- Uganda
- Uk
- Uruguay
- Zambia
- Log files
- Command-line interface
- Integrate with Kingfisher Process
- Contributing