Skip to main content

HDX Python generic geonode scraper

Project description

Build Status Coverage Status

The HDX Scraper Geonode Library enables easy building of scrapers for extracting data from geonode servers.

Usage

The library has detailed API documentation which can be found here: http://ocha-dap.github.io/hdx-scraper-geonode/. The code for the library is here: https://github.com/ocha-dap/hdx-scraper-geonode.

GeoNodeToHDX Class

You should create an object of the GeoNodeToHDX class which has methods: get_locationsdata, get_layersdata, generate_dataset_and_showcase.

geonodetohdx = GeoNodeToHDX('https://geonode.wfp.org', downloader)
# get countries where count > 0
countries = geonodetohdx.get_countries(use_count=True)
# get layers for country with ISO 3 code SDN
layers = geonodetohdx.get_layers(countryiso='SDN')
# get layers for all countries
geonodetohdx = GeoNodeToHDX('https://geonode.themimu.info', downloader)
layers = get_layers(countryiso=None)

There are default terms to be ignored and mapped. These can be overridden by creating a YAML configuration with the new configuration in this format:

ignore_data:
  - deprecated

category_mapping:
  Elevation: 'elevation - topography - altitude'
  'Inland Waters': river

titleabstract_mapping:
  bridges:
    - bridges
    - transportation
    - 'facilities and infrastructure'
  idp:
    camp:
      - 'displaced persons locations - camps - shelters'
      - 'internally displaced persons - idp'
    else:
      - 'internally displaced persons - idp'

ignore_data are any terms in the abstract that mean that the dataset should not be added to HDX.

category_mapping are mappings from the category field category__gn_description to HDX metadata tags.

titleabstract_mapping are terms in the title or abstract that mean that particular tags should be added to the HDX metadata tags.

For more fine grained tuning of these, you retrieve the dictionaries and manipulate them directly:

geonodetohdx = GeoNodeToHDX('https://geonode.wfp.org', downloader)
ignore_data = geonodetohdx.get_ignore_data() 
category_mapping = geonodetohdx.get_category_mapping() 
titleabstract_mapping = geonodetohdx.get_titleabstract_mapping()         

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdx-scraper-geonode-1.0.0.tar.gz (16.8 kB view hashes)

Uploaded Source

Built Distribution

hdx_scraper_geonode-1.0.0-py2.py3-none-any.whl (8.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page