Skip to main content

HDX Python generic geonode scraper

Project description

Build Status Coverage Status

The HDX Scraper Geonode Library enables easy building of scrapers for extracting data from geonode servers.


The library has detailed API documentation which can be found here: The code for the library is here:

Breaking Changes

1.4.0 supports only Python 3.6 and later

GeoNodeToHDX Class

You should create an object of the GeoNodeToHDX class:

geonodetohdx = GeoNodeToHDX('', downloader)
geonodetohdx = GeoNodeToHDX('', downloader)

It has high level methods generate_datasets_and_showcases and delete_other_datasets:

# generate datasets and showcases reading country and layer information from the GeoNode
datasets = generate_datasets_and_showcases('maintainerid', 'orgid', 'orgname', updatefreq='Adhoc', 
# generate datasets and showcases reading layer information ignoring region (country) in layers call
countrydata = {'iso3': 'MMR', 'name': 'Myanmar', 'layers': None}
datasets = generate_datasets_and_showcases('maintainerid', 'orgid', 'orgname', updatefreq='Adhoc', 
                                           subnational=True, countrydata=countrydata)
# delete any datasets and associated showcases from HDX that are not in the list datasets
# (assuming matching organisation id, maintainer id and geonode url in the resource url)

If you need more fine grained control, it has low level methods get_locationsdata, get_layersdata, generate_dataset_and_showcase:

# get countries where count > 0
countries = geonodetohdx.get_countries(use_count=True)
# get layers for country with ISO 3 code SDN
layers = geonodetohdx.get_layers(countryiso='SDN')
# get layers for all countries
layers = get_layers(countryiso=None)

There are default terms to be ignored and mapped. These can be overridden by creating a YAML configuration with the new configuration in this format:

  - deprecated

  Elevation: 'elevation - topography - altitude'
  'Inland Waters': river

    - bridges
    - transportation
    - 'facilities and infrastructure'
      - 'displaced persons locations - camps - shelters'
      - 'internally displaced persons - idp'
      - 'internally displaced persons - idp'

ignore_data are any terms in the abstract that mean that the dataset should not be added to HDX.

category_mapping are mappings from the category field category__gn_description to HDX metadata tags.

titleabstract_mapping are mappings from terms in the title or abstract to HDX metadata tags.

For more fine grained tuning of these, you retrieve the dictionaries and manipulate them directly:

geonodetohdx = GeoNodeToHDX('', downloader)
ignore_data = geonodetohdx.get_ignore_data() 
category_mapping = geonodetohdx.get_category_mapping() 
titleabstract_mapping = geonodetohdx.get_titleabstract_mapping()         

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for hdx-scraper-geonode, version 1.4.1
Filename, size File type Python version Upload date Hashes
Filename, size hdx_scraper_geonode-1.4.1-py2.py3-none-any.whl (9.8 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size hdx-scraper-geonode-1.4.1.tar.gz (22.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page