Skip to main content

HDX Python generic geonode scraper

Project description

Build Status Coverage Status

The HDX Scraper Geonode Library enables easy building of scrapers for extracting data from geonode servers.

Usage

The library has detailed API documentation which can be found here: http://ocha-dap.github.io/hdx-scraper-geonode/. The code for the library is here: https://github.com/ocha-dap/hdx-scraper-geonode.

GeoNodeToHDX Class

You should create an object of the GeoNodeToHDX class:

geonodetohdx = GeoNodeToHDX('https://geonode.wfp.org', downloader)
geonodetohdx = GeoNodeToHDX('https://geonode.themimu.info', downloader)

It has high level methods generate_datasets_and_showcases and delete_other_datasets:

# generate datasets and showcases reading country and layer information from the GeoNode
datasets = generate_datasets_and_showcases('maintainerid', 'orgid', 'orgname', updatefreq='Adhoc', 
                                           subnational=True)
# generate datasets and showcases reading layer information ignoring region (country) in layers call
countrydata = {'iso3': 'MMR', 'name': 'Myanmar', 'layers': None}
datasets = generate_datasets_and_showcases('maintainerid', 'orgid', 'orgname', updatefreq='Adhoc', 
                                           subnational=True, countrydata=countrydata)
# delete any datasets and associated showcases from HDX that are not in the list datasets
# (assuming matching organisation id, maintainer id and geonode url in the resource url)
delete_other_datasets(datasets)

If you need more fine grained control, it has low level methods get_locationsdata, get_layersdata, generate_dataset_and_showcase:

# get countries where count > 0
countries = geonodetohdx.get_countries(use_count=True)
# get layers for country with ISO 3 code SDN
layers = geonodetohdx.get_layers(countryiso='SDN')
# get layers for all countries
layers = get_layers(countryiso=None)

There are default terms to be ignored and mapped. These can be overridden by creating a YAML configuration with the new configuration in this format:

ignore_data:
  - deprecated

category_mapping:
  Elevation: 'elevation - topography - altitude'
  'Inland Waters': river

titleabstract_mapping:
  bridges:
    - bridges
    - transportation
    - 'facilities and infrastructure'
  idp:
    camp:
      - 'displaced persons locations - camps - shelters'
      - 'internally displaced persons - idp'
    else:
      - 'internally displaced persons - idp'

ignore_data are any terms in the abstract that mean that the dataset should not be added to HDX.

category_mapping are mappings from the category field category__gn_description to HDX metadata tags.

titleabstract_mapping are mappings from terms in the title or abstract to HDX metadata tags.

For more fine grained tuning of these, you retrieve the dictionaries and manipulate them directly:

geonodetohdx = GeoNodeToHDX('https://geonode.wfp.org', downloader)
ignore_data = geonodetohdx.get_ignore_data() 
category_mapping = geonodetohdx.get_category_mapping() 
titleabstract_mapping = geonodetohdx.get_titleabstract_mapping()         

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hdx-scraper-geonode-1.2.9.tar.gz (20.5 kB view hashes)

Uploaded Source

Built Distribution

hdx_scraper_geonode-1.2.9-py2.py3-none-any.whl (9.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page