OCDS Kingfisher Collect ======================= .. include:: ../README.rst You can: - :doc:`Download data to your computer, by installing Kingfisher Collect` - :doc:`Download data to a remote server, by using Scrapyd` - :doc:`Integrate with Kingfisher Process` Instead of installing Kingfisher Collect to your computer, you can `follow this interactive step-by-step guide `__, to use Kingfisher Collect in `Google Colaboratory `__. You can also try using Kingfisher Collect with `Scrapy Cloud `_. .. _how-it-works: How it works ------------ Kingfisher Collect is built on the `Scrapy `_ framework. Using this framework, we have authored "spiders" that you can run in order to "crawl" data sources and extract OCDS data. When collecting data from a data source, each of its OCDS files will be written to a separate file on your computer. Kingfisher Collect also ensures that the files are always either a `release package `__ or `record package `__, depending on the source. By default, these files are written to a ``data`` directory (you can :ref:`change this`) within your ``kingfisher-collect`` directory (which you will create :ref:`during installation`). Each spider creates its own directory within the ``data`` directory, and each crawl of a given spider creates its own directory within its spider's directory. For example, if you run the ``zambia`` spider (:ref:`learn how`), then the directory hierarchy will look like: .. code-block:: none kingfisher-collect/ └── data └── zambia └── 20200102_030405 ├── C8E │ ├── <...>.json │ └── <...> └── D1D ├── <...>.json └── <...> As you can see, the ``data`` directory contains a ``zambia`` spider directory (matching the spider's name), which in turn contains a ``20200102_030405`` crawl directory (matching the time at which you started the crawl – in this case, 2020-01-02 03:04:05). Within the crawl directory, ``.json`` files – the OCDS data – are split among subdirectories with opaque names, to not exceed filesystem limits. .. toctree:: :caption: Contents :maxdepth: 2 local.rst scrapyd.rst spiders.rst logs.rst cli.rst kingfisher_process.rst contributing/index.rst history.rst