Skip to main content

HDX Python scraper utilities to assemble data from multiple sources

Project description

Build Status Coverage Status Code style: black Imports: isort Downloads

The HDX Python Scraper Library is designed to enable you to easily develop code that assembles data from one or more tabular sources that can be csv, xls, xlsx or JSON. It uses a YAML file that specifies for each source what needs to be read and allows some transformations to be performed on the data. The output is written to JSON, Google sheets and/or Excel and includes the addition of Humanitarian Exchange Language (HXL) hashtags specified in the YAML file. Custom Python scrapers can also be written that conform to a defined specification and the framework handles the execution of both configurable and custom scrapers.

For more information, please read the documentation.

This library is part of the Humanitarian Data Exchange (HDX) project. If you have humanitarian related data, please upload your datasets to HDX.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdx_python_scraper-2.7.3.tar.gz (6.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hdx_python_scraper-2.7.3-py3-none-any.whl (56.6 kB view details)

Uploaded Python 3

File details

Details for the file hdx_python_scraper-2.7.3.tar.gz.

File metadata

  • Download URL: hdx_python_scraper-2.7.3.tar.gz
  • Upload date:
  • Size: 6.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hdx_python_scraper-2.7.3.tar.gz
Algorithm Hash digest
SHA256 70be62289b5a03fd1790d4ba304bf73256cf5a1d21dd9f4498bbe1a052a7a82f
MD5 354af4446fc2428107851cb23cc4cedd
BLAKE2b-256 7dae717a3917dd18bbeab454844ddd6c007f070c37bd76c28a248be0d0fb31d6

See more details on using hashes here.

File details

Details for the file hdx_python_scraper-2.7.3-py3-none-any.whl.

File metadata

  • Download URL: hdx_python_scraper-2.7.3-py3-none-any.whl
  • Upload date:
  • Size: 56.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for hdx_python_scraper-2.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1cb2fc4b5665cd2f657e9dc98793e2cf7d7d2fe37405463518fc823e79567679
MD5 11288f1baa28915658ca2653163b90ca
BLAKE2b-256 4766d69e3606b2b6b68f51883f71c976ecb2387bcaaaf141cbd39efa4b420e4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page