Skip to main content

A Python package for accessing and analyzing Demographic and Health Survey (DHS) data.

Project description

pdhs

Motivation

Access to high-quality, structured, and timely demographic and health data is essential for researchers, policymakers, and public health professionals. The Demographic and Health Surveys (DHS) Program provides a rich repository of standardized datasets across countries and years. However, accessing and using this data programmatically can be cumbersome due to inconsistencies in interfaces, authentication requirements, and data formatting.

The pdhs Python library aims to streamline and simplify interaction with the DHS API. It offers an intuitive, well-documented, and Pythonic interface for querying, retrieving, and managing DHS datasets. By abstracting low-level API complexities, pdhs allows users to focus on analysis and application rather than on data wrangling. It supports reproducible research, integrates smoothly with common data science workflows (e.g., pandas, numpy, matplotlib), and promotes broader usage of DHS data in academic, development, and policy contexts.

In short, pdhs bridges the gap between powerful public data and the tools needed to derive meaningful insights from it.


pdhs is a package for managing and analyzing Demographic and Health Survey (DHS) data. It provides functionality to:

  1. Access standard indicator data (via DHS STATcompiler) using the DHS API.
  2. Identify surveys and datasets relevant to specific analyses.
  3. Download survey datasets from the DHS website.
  4. Load datasets and associated metadata into Python.
  5. Extract variables and combine datasets for pooled multi-survey analyses.

Installation

Install the latest version from PyPI using:

pip install pdhs

Note: To download datasets from DHS, you must also install Playwright:

playwright install

Getting Started

To download survey datasets, you must first create an account with DHS and request access. You’ll need the email, password, and project name associated with your DHS account when using pdhs.

  • Request dataset access here

Basic Functionality

Query the DHS API

The example below retrieves Total Fertility Rate estimates for Albanian women in the middle and second wealth quintiles, categorized by region:

from pdhs.indicators import GetIndicatorsData

indicators_data = GetIndicatorsData(
    country_ids=["AL"],
    characteristic_category=["wealth quintile", "region"],
    characteristic_label=["middle", "second"],
    breakdown="all"
)

fertility = indicators_data.get_data()
print(fertility.head())
Sample Output
shape: (5, 28)
┌─────────┬───────────┬─────────────┬─────────────┬───┬────────┬─────────┬─────────────┬───────────┐
│ DataId  ┆ SurveyId  ┆ Indicator   ┆ IsPreferred ┆ … ┆ CIHigh ┆ IsTotal ┆ ByVariableI ┆ LevelRank │
│ ---     ┆ ---       ┆ ---         ┆ ---         ┆   ┆ ---    ┆ ---     ┆ d           ┆ ---       │
│ i64     ┆ str       ┆ str         ┆ i64         ┆   ┆ str    ┆ i64     ┆ ---         ┆ str       │
│         ┆           ┆             ┆             ┆   ┆        ┆         ┆ i64         ┆           │
╞═════════╪═══════════╪═════════════╪═════════════╪═══╪════════╪═════════╪═════════════╪═══════════╡
│ 3361769 ┆ AL2008DHS ┆ Age         ┆ 1           ┆ … ┆        ┆ 0       ┆ 0           ┆           │
│         ┆           ┆ specific    ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ fertility   ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ rate: 1…    ┆             ┆   ┆        ┆         ┆             ┆           │
│ 3419763 ┆ AL2008DHS ┆ Age         ┆ 1           ┆ … ┆        ┆ 0       ┆ 0           ┆           │
│         ┆           ┆ specific    ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ fertility   ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ rate: 1…    ┆             ┆   ┆        ┆         ┆             ┆           │
│ 3361770 ┆ AL2008DHS ┆ Age         ┆ 1           ┆ … ┆        ┆ 0       ┆ 0           ┆           │
│         ┆           ┆ specific    ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ fertility   ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ rate: 1…    ┆             ┆   ┆        ┆         ┆             ┆           │
│ 3419764 ┆ AL2008DHS ┆ Age         ┆ 1           ┆ … ┆        ┆ 0       ┆ 0           ┆           │
│         ┆           ┆ specific    ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ fertility   ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ rate: 1…    ┆             ┆   ┆        ┆         ┆             ┆           │
│ 3361764 ┆ AL2008DHS ┆ Age         ┆ 1           ┆ … ┆        ┆ 0       ┆ 0           ┆           │
│         ┆           ┆ specific    ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ fertility   ┆             ┆   ┆        ┆         ┆             ┆           │
│         ┆           ┆ rate: 2…    ┆             ┆   ┆        ┆         ┆             ┆           │
└─────────┴───────────┴─────────────┴─────────────┴───┴────────┴─────────┴─────────────┴───────────┘│
...

Download Datasets

To dowload DHS datasets using pdhs, you need to generate a dataframe using the GetDatasets() class specifying the country, and file format you want to download.

To determine which datasets to download, refer to the DHS website or use filtering options provided by the library.


Recommendation:

  • Use fileFormat = "SV" for SPSS (.sav) — slower but fully reliable
  • Use fileFormat = "FL" for flat (.dat) files — faster, but a few old datasets may not load correctly

from pdhs.datasets import GetDatasets

data = GetDatasets(
    country_ids=["NG"],
    file_format="DT"
)


df = data.get_data()
Sample Output
shape: (5, 13)
┌───────────────┬──────────┬─────────────────┬───────────┬───┬────────────┬─────────────────┬──────────────┬─────────────┐
│ FileFormat    ┆ FileSize ┆ DatasetType     ┆ SurveyNum ┆ … ┆ SurveyYear ┆ DHS_CountryCode ┆ FileName     ┆ CountryName │
│ ---           ┆ ---      ┆ ---             ┆ ---       ┆   ┆ ---        ┆ ---             ┆ ---          ┆ ---         │
│ str           ┆ i64      ┆ str             ┆ i64       ┆   ┆ str        ┆ str             ┆ str          ┆ str         │
╞═══════════════╪══════════╪═════════════════╪═══════════╪═══╪════════════╪═════════════════╪══════════════╪═════════════╡
│ Stata dataset ┆ 2563446  ┆ Survey Datasets ┆ 32        ┆ … ┆ 1990       ┆ NG              ┆ NGBR21dt.zip ┆ Nigeria     │
│ (.dta)        ┆          ┆                 ┆           ┆   ┆            ┆                 ┆              ┆             │
│ Stata dataset ┆ 505235   ┆ Survey Datasets ┆ 32        ┆ … ┆ 1990       ┆ NG              ┆ NGHR21DT.ZIP ┆ Nigeria     │
│ (.dta)        ┆          ┆                 ┆           ┆   ┆            ┆                 ┆              ┆             │
│ Stata dataset ┆ 76104    ┆ Survey Datasets ┆ 32        ┆ … ┆ 1990       ┆ NG              ┆ NGHW21DT.ZIP ┆ Nigeria     │
│ (.dta)        ┆          ┆                 ┆           ┆   ┆            ┆                 ┆              ┆             │
│ Stata dataset ┆ 3216090  ┆ Survey Datasets ┆ 32        ┆ … ┆ 1990       ┆ NG              ┆ NGIR21DT.ZIP ┆ Nigeria     │
│ (.dta)        ┆          ┆                 ┆           ┆   ┆            ┆                 ┆              ┆             │
│ Stata dataset ┆ 2067840  ┆ Survey Datasets ┆ 32        ┆ … ┆ 1990       ┆ NG              ┆ NGKR21DT.ZIP ┆ Nigeria     │
│ (.dta)        ┆          ┆                 ┆           ┆   ┆            ┆                 ┆              ┆             │
└───────────────┴──────────┴─────────────────┴───────────┴───┴────────────┴─────────────────┴──────────────┴─────────────┘

Once access has been granted, use the DHSDownloader() and pass a list of the datasets you are interested in downloading using the .download_all_datasets() method.

import os
import asyncio
from dotenv import load_dotenv
from pdhs.download import DHSDownloader

load_dotenv()

dhs_password = os.getenv("DHS_PASSWORD")

downloader = DHSDownloader(
    email="<YOUR-DHS-EMAIL>",
    password="<YOUR-DHS-PASSWORD>",
    download_path="my_files",
    project_name="Rural and Urban",
    dataframe=df
)

dataset_ids = ['NGHW21DT.ZIP', 'NGBR21dt.zip', 'NGKR21DT.ZIP']

await downloader.download_all_datasets(dataset_ids)

Tips:

  • Use .env variables to store credentials securely.
  • Change the download_path argument to set your preferred download folder.

Note:

  • The DHSDownloader() class takes an argument dataframe which is a dataset derived from the GetDatasets() class. You have to pass the list of datasets you are interested into the .download_all_datasets() class to download them.

Load Downloaded Data

After downloading, load a dataset into memory as a Polars DataFrame:

dataset_id = 'NGHW21DT.ZIP'  # Example ZIP dataset
df_loaded = downloader.load_dataset_as_dataframe(dataset_id)
Sample Output
Downloading dataset: NGHW21DT.ZIP
Country Name: Nigeria
Country Code: NG
Survey ID: 32
File downloaded successfully and saved to my_files/NGHW21DT.ZIP
Downloading dataset: NGBR21dt.zip
Country Name: Nigeria
Country Code: NG
Survey ID: 32
File downloaded successfully and saved to my_files/NGBR21dt.zip
Downloading dataset: NGKR21DT.ZIP
Country Name: Nigeria
Country Code: NG
Survey ID: 32
File downloaded successfully and saved to my_files/NGKR21DT.ZIP
Extracted NGHW21DT.ZIP to my_files
Selected file for loading: my_files/NGHW21FL.DTA
Dataset NGHW21DT.ZIP loaded successfully.
shape: (5, 7)
┌─────────────────┬────────┬─────────┬──────┬──────┬──────┬──────┐
│ hwcaseid        ┆ hwline ┆ hwlevel ┆ hc70 ┆ hc71 ┆ hc72 ┆ hc73 │
│ ---             ┆ ---    ┆ ---     ┆ ---  ┆ ---  ┆ ---  ┆ ---  │
│ str             ┆ i64    ┆ i64     ┆ i64  ┆ i64  ┆ i64  ┆ i64  │
╞═════════════════╪════════╪═════════╪══════╪══════╪══════╪══════╡
│       101 11  2 ┆ 1      ┆ 2       ┆ -74  ┆ -10  ┆ 34   ┆ 47   │
│       101 11  2 ┆ 2      ┆ 2       ┆ -67  ┆ 6    ┆ 58   ┆ 68   │
│       101 19  2 ┆ 1      ┆ 2       ┆ null ┆ null ┆ null ┆ null │
│       101 19  2 ┆ 2      ┆ 2       ┆ null ┆ null ┆ null ┆ null │
│       101 39  2 ┆ 1      ┆ 2       ┆ -258 ┆ -138 ┆ 58   ┆ 20   │
└─────────────────┴────────┴─────────┴──────┴──────┴──────┴──────┘

Convert to Pandas (Optional)

By default, pdhs returns data as Polars DataFrames for performance. You can easily convert to Pandas:

import pandas as pd

df = df_loaded.to_pandas()
df.head()
Sample Output
hwcaseid	hwline	hwlevel	hc70	hc71	hc72	hc73
0	101 11 2	1	2	-74.0	-10.0	34.0	47.0
1	101 11 2	2	2	-67.0	6.0	58.0	68.0
2	101 19 2	1	2	NaN	NaN	NaN	NaN
3	101 19 2	2	2	NaN	NaN	NaN	NaN
4	101 39 2	1	2	-258.0	-138.0	58.0	20.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdhs-0.1.6.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdhs-0.1.6-py3-none-any.whl (21.9 kB view details)

Uploaded Python 3

File details

Details for the file pdhs-0.1.6.tar.gz.

File metadata

  • Download URL: pdhs-0.1.6.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for pdhs-0.1.6.tar.gz
Algorithm Hash digest
SHA256 74af52b2c43c38a3b8fe5f221331db2136ba22e90da19132e4de7b2148c31c93
MD5 12eda304bcdd24e94961ec6fa5ef1ef2
BLAKE2b-256 70141aa89aba4a4752448ba78f69b7dda01ae00155c92b3c5a28865f9ed52524

See more details on using hashes here.

File details

Details for the file pdhs-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: pdhs-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 21.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.12

File hashes

Hashes for pdhs-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 f747a2ea6fcb7af1f94d1e7ead8956e5ceccbf751f935d6673cce0a9071196c4
MD5 484c0d578c7bdf414267d36bcba99cf1
BLAKE2b-256 bf3c8368eceabe9f634456b8b75e7bb25251c6ed86eaeb2b954a9fe8f924f02f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page