Skip to main content

HDX Python scraper utilities

Project description

Build Status Coverage Status Code style: black Imports: isort

The HDX Python Scraper Library is designed to enable you to easily develop code that assembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It uses a YAML file that specifies for each source what needs to be read and allows some transformations to be performed on the data. The output is written to JSON, Google sheets and/or Excel and includes the addition of Humanitarian Exchange Language (HXL) hashtags specified in the YAML file. Custom Python scrapers can also be written that conform to a defined specification and the framework handles the execution of both configurable and custom scrapers.

For more information, please read the documentation.

This library is part of the Humanitarian Data Exchange (HDX) project. If you have humanitarian related data, please upload your datasets to HDX.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdx-python-scraper-1.5.0.tar.gz (2.4 MB view details)

Uploaded Source

File details

Details for the file hdx-python-scraper-1.5.0.tar.gz.

File metadata

  • Download URL: hdx-python-scraper-1.5.0.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for hdx-python-scraper-1.5.0.tar.gz
Algorithm Hash digest
SHA256 bd1c3f86f4a6513cbf7908b9f2c3aed90c51bdb8e95ca1e77d05d709db66998c
MD5 8e538c7e9059446f7f01f2cb8374d6e0
BLAKE2b-256 8159b99dbc15a0b1a5794b4ea13c2648a55a5902d6a6a80f0be9695cd7ddbb19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page