Skip to main content

Library to help you work with `WorldPop` data for any region on earth

Project description

WorldPopPy

WorldPopPy is a Python package that helps you work with data from the WorldPop project. WorldPop offers global, gridded geo-datasets on population dynamics, land-cover features, night-light emissions, and several other attributes of human and natural geography. This package streamlines the process of downloading, combining, and cleaning WorldPop data for different geographic regions and years.

Key Features

  • Fetch data for any part of the world by passing GeoDataFrames, country codes, or bounding boxes.
  • Easy handling of annual time-series through integration with xarray.
  • Parallel data downloads with retry mechanism and ability to preview estimated download sizes (dry run).
  • Auto-updating manifest file so you stay up-to-date with WorldPop’s latest available datasets.

Installation

WorldPopPy is available on PyPI and can be installed using pip:

pip install worldpoppy

Quickstart

import matplotlib.pyplot as plt
from matplotlib.colors import LogNorm

from worldpoppy import wp_raster, clean_axis

# Fetch night-light data for the Korean Peninsula.
# Data is returned as an `xarray.DataArray` ready for analysis and plotting
viirs_data = wp_raster(
    product_name='viirs_100m',  # name of WorldPop's night-light product
    aoi=['PRK', 'KOR'],  # three-letter country codes for North and South Korea  
    years=2015,
    masked=True,  # mask missing values with NaN (instead of WorldPop's default fill value),
)  

# Downsample the data to speed-up plotting
lowres = viirs_data.coarsen(x=5, y=5, boundary='trim').mean()

# Plot
lowres.plot(vmin=0.1, cmap='inferno', norm=LogNorm())
clean_axis(title='Night Lights (2015)\nKorean Peninsula')

plt.show()

More detailed example

Below, we visualise population growth in a patch of West Africa from 2000 to 2020. The geographic area of interest is selected with a helper function that can convert a location name into a bounding box. The example below also shows you how to re-project WorldPop data into a different Coordinate Reference System (CRS).

import matplotlib.pyplot as plt
import numpy as np

from worldpoppy import *

# Define the area of interest 
# Note: `bbox_from_location` runs a `Nomatim` query under the hood 
aoi_box = bbox_from_location('Accra', width_km=500)  # returns (min_lon, min_lat, max_lon, max_lat)

# Define the target CRS (optional)
aeqa_africa = "ESRI:102022"  # an Albers Equal Area projection optimised for Africa

# Fetch the population data
pop_data = wp_raster(
    product_name='ppp',  # name of the WorldPop product (here: # of people per raster cell)
    aoi=aoi_box,  # you could also pass a GeoDataFrame or official country codes
    years=[2000, 2020],  # the years of interest (for annual WorldPop products only)
    masked=True,  # mask missing values with NaN (instead of WorldPop's default fill value)
    to_crs=aeqa_africa  # if None is provided, CRS of the source data will be kept (EPSG:4326)
)

# Compute population changes on downsampled data
lowres = pop_data.coarsen(x=10, y=10, year=1, boundary='trim').reduce(np.sum)  # will propagate NaNs
pop_change = lowres.sel(year=2020) - lowres.sel(year=2000)

# Plot
pop_change.plot(cmap='coolwarm', vmax=1_000, cbar_kwargs=dict(shrink=0.85))
clean_axis(title='Estimated population change (2000 to 2020)')

# Add visual references
plot_country_borders(['GHA', 'TOG', 'BEN'], edgecolor='white', to_crs=aeqa_africa)
plot_location_markers(['Accra', 'Kumasi', 'Lomé'], to_crs=aeqa_africa)

plt.show()

Further details

Data dimensions

Calling wp_raster() will always return an xarray.DataArray. The array dimensions, however, depend on the user query. If you request data for more than one year, the returned array will include a year dimension in addition to the raster data's two spatial dimensions (x and y). By contrast, the year dimension will be omitted if you request data for a single year only, or if the WorldPop product in question is static anyway (e.g., when requesting elevation data).

Managing the local cache

By default, downloaded source data from WorldPop will be cached on disk for re-use. To disable caching, set cache_downloads=False when calling wp_raster(). The default cache directory is ~/.cache/worldpoppy. This can be changed by pointing the WORLDPOPPY_CACHE_DIR environment variable to the desired location, as shown here.

Use the following function to delete all cached data or simply check the local cache size:

from worldpoppy import purge_cache

purge_cache(dry_run=True)
# dry run will only print a cache summary and not delete any files

Download dry runs

Before you request data for large geographic areas and/or many years, you may want to check download requirements first. Setting download_dry_run=True will check download requirements and print a summary:

from worldpoppy import wp_raster

_ = wp_raster(
    product_name='ppp',
    aoi='CAN USA MEX'.split(),
    years='all',  # query all available years for the specified product 
    download_dry_run=True  # do not actually download anything and merely print a summary  
)
# Note that `wp_raster` will return `None` in this case

Selecting data with a GeoDataFrame

... is straightforward, as shown in this example.

The WorldPop data manifest

Use the wp_manifest function to load and optionally filter the manifest file listing all available WorldPop datasets:

from worldpoppy import wp_manifest

full_manifest = wp_manifest()  # returns a `pandas.DataFrame`
full_manifest.head(2)

The local manifest file is auto-updated by comparing it against a remote version hosted on WorldPop servers. If needed, the remote manifest is downloaded and cleaned for local use. Note that the remote WorldPop manifest sometimes lists datasets that are not actually available for download. Requesting such datasets will trigger a DownloadError.

Downloads only?

If you are only interested in asynchronous country-data downloads from WorldPop, without any other functionality, use the WorldPopDownloader class:

from worldpoppy import WorldPopDownloader

raster_fpaths = WorldPopDownloader().download(
    product_name='srtm_slope_100m',  # topographic slope
    iso3_codes=['LIE'],  # Liechtenstein
)

Acknowledgements

The implementation of WorldPopPy draws on the World Bank's BlackMarblePy package, which gives users easy access to night-light data from NASA's Black Marble project.

Feedback

If you would like to give feedback, encounter issues, or want to suggest improvements, please open an issue. Since this package is developed and tested on Linux, issues encountered on other platforms may take longer to address.

License

This projects is licensed under the Mozilla Public License. See LICENSE.txt for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

worldpoppy-0.1.0.tar.gz (6.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

worldpoppy-0.1.0-py3-none-any.whl (6.0 MB view details)

Uploaded Python 3

File details

Details for the file worldpoppy-0.1.0.tar.gz.

File metadata

  • Download URL: worldpoppy-0.1.0.tar.gz
  • Upload date:
  • Size: 6.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for worldpoppy-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a44d0ae8740e7513a2e2edfe1818fa5b6bd546fd35cb655e5510537101d24852
MD5 e51de0683324e7fb67dd2491ce9d5340
BLAKE2b-256 cffe3bec5074d4a71a92eb2bb4d3d288f3eeafca6feeb623ce58c10f83ac51bf

See more details on using hashes here.

File details

Details for the file worldpoppy-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: worldpoppy-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.17

File hashes

Hashes for worldpoppy-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e4ee8f5f8a5d9fef8f5f1a199c3d33b5ffc7d00967e2be2bd74dd93149d8adc
MD5 06c69928190392f6c220a62a38e10e66
BLAKE2b-256 d2256031f7ff8b065ee8dd482fcc7c4db592281eeb26e3cea425673184251329

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page