HDX Python scraper utilities to assemble data from multiple sources
Project description
The HDX Python Scraper Library is designed to enable you to easily develop code that assembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It uses a YAML file that specifies for each source what needs to be read and allows some transformations to be performed on the data. The output is written to JSON, Google sheets and/or Excel and includes the addition of Humanitarian Exchange Language (HXL) hashtags specified in the YAML file. Custom Python scrapers can also be written that conform to a defined specification and the framework handles the execution of both configurable and custom scrapers.
For more information, please read the documentation.
This library is part of the Humanitarian Data Exchange (HDX) project. If you have humanitarian related data, please upload your datasets to HDX.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for hdx_python_scraper-2.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 766c4c4fb2e0e7e060953330300daa5deee2bdf9e2ec6235ed2286e3f6704d42 |
|
MD5 | 6a4d7cd203ba017baa73e736f260d941 |
|
BLAKE2b-256 | d36e63250a8834d89dd9d212e3602585b9043ececfe6f35f08f6d659b6e73c3a |