Skip to main content

Python library for accessing epidemiological datasets worldwide

Project description

๐ŸŒ Epidatasets

Python 3.10+ PyPI License: MIT Code style: Black

Open Issues Help Wanted Good First Issue Data Source Requests

Documentation CI Status Code Coverage

Stars Forks Contributors

Sponsored by
Kwar-AI
AI-powered epidemiological intelligence


A Python library providing unified access to 21 epidemiological data sources from around the world, with a plugin registry, CLI, and optional extras for specialized data.

๐Ÿ“‹ Table of Contents

๐ŸŽฏ Overview

epidatasets provides:

  • Unified interface โ€” A single get_source() API to access 21 data sources worldwide
  • Plugin registry โ€” Sources are discovered at runtime via entry_points, making it easy to extend
  • Optional extras โ€” Install only the dependencies you need (pip install epidatasets[who,brazil])
  • CLI โ€” Command-line tool for listing sources, inspecting metadata, and querying countries
  • Caching & rate limiting โ€” Built-in utilities for responsible API usage
  • Reproducible research โ€” Standardized access to heterogeneous epidemiological datasets

๐Ÿ“ฆ Installation

From PyPI

pip install epidatasets

With optional extras

# WHO Global Health Observatory data
pip install epidatasets[who]

# Brazilian DATASUS/SINAN data via PySUS
pip install epidatasets[brazil]

# Eurostat EU health statistics
pip install epidatasets[eurostat]

# Climate/environmental data (Copernicus CDS)
pip install epidatasets[climate]

# Geospatial visualization
pip install epidatasets[geo]

# Plotting & visualization
pip install epidatasets[viz]

# Genomic data (Pathoplexus)
pip install epidatasets[genomics]

# CLI support
pip install epidatasets[cli]

# World Bank indicators
pip install epidatasets[worldbank]

# Install everything
pip install epidatasets[all]

Development installation

git clone https://github.com/fccoelho/epidemiological-datasets.git
cd epidemiological-datasets
pip install -e ".[dev,docs]"

๐Ÿš€ Quick Start

from epidatasets import get_source, list_sources

# Discover available sources
sources = list_sources()
for name, meta in sorted(sources.items()):
    print(f"{name}: {meta['description']}")

# Get a specific source
paho = get_source("paho")
countries = paho.list_countries()
print(f"PAHO covers {len(countries)} countries")

# Get WHO data (requires: pip install epidatasets[who])
who = get_source("who")
malaria = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2020, 2021, 2022],
    countries=["BRA", "IND", "NGA"]
)

# Get OWID COVID-19 data
owid = get_source("owid")
covid = owid.get_covid_data(
    countries=["BRA", "USA", "IND"],
    metrics=["cases", "deaths"]
)

๐Ÿ“ Repository Structure

epidemiological-datasets/
โ”œโ”€โ”€ src/epidatasets/           # Main Python package
โ”‚   โ”œโ”€โ”€ __init__.py            # Public API (get_source, list_sources)
โ”‚   โ”œโ”€โ”€ _base.py               # BaseAccessor ABC
โ”‚   โ”œโ”€โ”€ _registry.py           # Plugin registry (entry_points)
โ”‚   โ”œโ”€โ”€ cli.py                 # CLI (typer)
โ”‚   โ”œโ”€โ”€ sources/               # 21 data source accessors
โ”‚   โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ”‚   โ”œโ”€โ”€ africa_cdc.py
โ”‚   โ”‚   โ”œโ”€โ”€ cdc_opendata.py
โ”‚   โ”‚   โ”œโ”€โ”€ china_cdc.py
โ”‚   โ”‚   โ”œโ”€โ”€ colombia_ins.py
โ”‚   โ”‚   โ”œโ”€โ”€ copernicus_cds.py
โ”‚   โ”‚   โ”œโ”€โ”€ datasus_pysus.py
โ”‚   โ”‚   โ”œโ”€โ”€ ecdc_opendata.py
โ”‚   โ”‚   โ”œโ”€โ”€ epipulse.py
โ”‚   โ”‚   โ”œโ”€โ”€ eurostat.py
โ”‚   โ”‚   โ”œโ”€โ”€ global_health.py
โ”‚   โ”‚   โ”œโ”€โ”€ healthdata_gov.py
โ”‚   โ”‚   โ”œโ”€โ”€ india_idsp.py
โ”‚   โ”‚   โ”œโ”€โ”€ infodengue_api.py
โ”‚   โ”‚   โ”œโ”€โ”€ malaria_atlas.py
โ”‚   โ”‚   โ”œโ”€โ”€ owid.py
โ”‚   โ”‚   โ”œโ”€โ”€ paho.py
โ”‚   โ”‚   โ”œโ”€โ”€ pathoplexus.py
โ”‚   โ”‚   โ”œโ”€โ”€ respicast.py
โ”‚   โ”‚   โ”œโ”€โ”€ rki_germany.py
โ”‚   โ”‚   โ”œโ”€โ”€ ukhsa.py
โ”‚   โ”‚   โ””โ”€โ”€ who_ghoclient.py
โ”‚   โ””โ”€โ”€ utils/                 # Utilities
โ”‚       โ”œโ”€โ”€ cache.py           # Caching layer
โ”‚       โ”œโ”€โ”€ rate_limit.py      # API rate limiting
โ”‚       โ”œโ”€โ”€ geo.py             # Geospatial helpers
โ”‚       โ”œโ”€โ”€ validation.py      # Data validation
โ”‚       โ””โ”€โ”€ io.py              # I/O utilities
โ”œโ”€โ”€ tests/                     # Test suite
โ”‚   โ”œโ”€โ”€ sources/
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ conftest.py
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ docs/                      # MkDocs documentation
โ”‚   โ”œโ”€โ”€ mkdocs.yml
โ”‚   โ””โ”€โ”€ docs/
โ”‚       โ”œโ”€โ”€ index.md
โ”‚       โ”œโ”€โ”€ installation.md
โ”‚       โ”œโ”€โ”€ quickstart.md
โ”‚       โ”œโ”€โ”€ sources/           # Per-source API docs (21 pages)
โ”‚       โ”œโ”€โ”€ api/               # API reference
โ”‚       โ”‚   โ”œโ”€โ”€ base.md
โ”‚       โ”‚   โ”œโ”€โ”€ registry.md
โ”‚       โ”‚   โ”œโ”€โ”€ cli.md
โ”‚       โ”‚   โ””โ”€โ”€ utils.md
โ”‚       โ””โ”€โ”€ examples/          # Jupyter notebooks
โ”œโ”€โ”€ mkdocs.yml                 # Docs config
โ”œโ”€โ”€ .readthedocs.yaml          # ReadTheDocs config
โ”œโ”€โ”€ pyproject.toml             # Package configuration
โ””โ”€โ”€ README.md

๐ŸŒ Available Datasets

Global ๐ŸŒ

Dataset Description Update Frequency Access Level Module
WHO Global Health Observatory Health indicators by country Annual Open epidatasets.sources.who_ghoclient
Our World in Data - Health COVID-19, vaccination, excess mortality Daily/Weekly Open epidatasets.sources.owid
Global Health Data Exchange (GHDx) Catalog of health datasets Varies Varies Catalog only
HDX (Humanitarian Data Exchange) Health in crisis contexts Real-time Open Planned
Global.health Pandemic linelist data Varies Open epidatasets.sources.global_health
Malaria Atlas Project Malaria prevalence & vector data Annual Open epidatasets.sources.malaria_atlas
Copernicus Climate Data Store Environmental & climate data Varies Open epidatasets.sources.copernicus_cds
Pathoplexus Pathogen genomic data Continuous Open epidatasets.sources.pathoplexus
InfoDengue Dengue surveillance (Brazil) Weekly Open epidatasets.sources.infodengue_api

North America ๐Ÿ‡บ๐Ÿ‡ธ๐Ÿ‡จ๐Ÿ‡ฆ๐Ÿ‡ฒ๐Ÿ‡ฝ

Dataset Description Update Frequency Access Level Module
CDC Open Data CDC datasets portal (COVID-19, Influenza, NNDSS, CDI) Varies Open epidatasets.sources.cdc_opendata
HealthData.gov US health system data Weekly Open epidatasets.sources.healthdata_gov
Statistics Canada - Health Canadian health data Quarterly Open Planned

South America ๐ŸŒŽ

Dataset Description Update Frequency Access Level Module
SINAN / DATASUS - Brazil Brazilian notifiable diseases & health system data Weekly Open* epidatasets.sources.datasus_pysus
PAHO/WHO Regional Data Pan-American health data Monthly Open epidatasets.sources.paho
Chile DEIS Chilean health statistics Monthly Open Planned
Colombia INS Colombian public health data (SIVIGILA) Weekly Open epidatasets.sources.colombia_ins

*Note: DATASUS access requires pip install epidatasets[brazil] (installs PySUS).

Europe ๐Ÿ‡ช๐Ÿ‡บ

Dataset Description Update Frequency Access Level Module
ECDC EpiPulse European surveillance portal (53 countries, 50+ diseases) Daily/Weekly Registration epidatasets.sources.epipulse
ECDC Open Data Infectious disease surveillance (50+ diseases, 30 countries) Weekly Open epidatasets.sources.ecdc_opendata
ECDC RespiCast Respiratory disease forecasting hub Weekly Open epidatasets.sources.respicast
Eurostat Health EU health statistics Annual Open epidatasets.sources.eurostat
UK Health Security Agency UK health data Weekly Open epidatasets.sources.ukhsa
Robert Koch Institute German surveillance data Weekly Open epidatasets.sources.rki_germany

Africa ๐ŸŒ

Dataset Description Update Frequency Access Level Module
WHO Afro Health Observatory African region health data Annual Open epidatasets.sources.who_ghoclient
Africa CDC African public health data (55 AU member states) Weekly Open epidatasets.sources.africa_cdc

Asia ๐ŸŒ

Dataset Description Update Frequency Access Level Module
China CDC Weekly Chinese surveillance data Weekly Open epidatasets.sources.china_cdc
IDSP India Indian disease surveillance Weekly Open* epidatasets.sources.india_idsp
NIID Japan Japanese infectious disease data Weekly Open Planned
Korea CDC Korean disease control data Weekly Open Planned

Oceania ๐Ÿ‡ฆ๐Ÿ‡บ๐Ÿ‡ณ๐Ÿ‡ฟ

Dataset Description Update Frequency Access Level Module
Australian Institute of Health and Welfare Australian health data Annual Open Planned
NZ Ministry of Health New Zealand health statistics Annual Open Planned

๐Ÿ’ป CLI Usage

The epidatasets CLI provides quick access from the terminal (requires pip install epidatasets[cli]):

# List all available data sources
epidatasets sources

# Show detailed info about a source
epidatasets info who

# List countries covered by a source
epidatasets countries paho

๐Ÿ’ก Usage Examples

Example 1: WHO Global Health Data

from epidatasets import get_source

who = get_source("who")

# Get malaria incidence data
data = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2020, 2021, 2022],
    countries=["BRA", "IND", "NGA"]
)
print(data.head())

Example 2: PAHO Pan-American Health Data

from epidatasets import get_source

paho = get_source("paho")

# List member countries
countries = paho.list_countries()
print(f"Total countries: {len(countries)}")

# Get immunization coverage
coverage = paho.get_immunization_coverage(
    vaccines=['DTP3', 'MCV1'],
    subregion='Southern Cone',
    years=[2020, 2021, 2022]
)

# Compare health indicators
comparison = paho.compare_countries(
    indicator='LIFE_EXPECTANCY',
    countries=['BRA', 'MEX', 'ARG', 'COL'],
    years=[2019, 2020, 2021]
)

Example 3: Eurostat EU Health Statistics

from epidatasets import get_source

eurostat = get_source("eurostat")

# Healthcare expenditure
expenditure = eurostat.get_healthcare_expenditure(
    countries=['DEU', 'FRA', 'ITA'],
    years=list(range(2015, 2024))
)

# Mortality data by cause
mortality = eurostat.get_mortality_data(
    cause_code='COVID-19',
    countries=['DEU', 'FRA', 'ITA'],
    years=[2020, 2021, 2022]
)

# Life expectancy comparison
life_exp = eurostat.get_life_expectancy(
    countries=['DEU', 'FRA', 'ITA', 'ESP'],
    years=[2019, 2020, 2021]
)

Example 4: Our World in Data

from epidatasets import get_source

owid = get_source("owid")

# COVID-19 data for specific countries
covid = owid.get_covid_data(
    countries=['BRA', 'USA', 'IND'],
    metrics=['cases', 'deaths', 'hospitalizations'],
    start_date='2021-01-01',
    end_date='2021-12-31'
)

# Excess mortality estimates
excess = owid.get_excess_mortality(
    countries=['GBR', 'ITA', 'USA'],
    start_date='2020-03-01'
)

# Global summary
summary = owid.get_global_summary()

Example 5: Brazil DATASUS via PySUS

from epidatasets import get_source

datasus = get_source("datasus")

# Access Brazilian notifiable disease data
dengue = datasus.download(
    disease="Dengue",
    years=[2022, 2023],
    states=["RJ", "SP", "MG"]
)

Example 6: Africa CDC Data

from epidatasets import get_source

africa_cdc = get_source("africa_cdc")

# List all 55 African Union member states
countries = africa_cdc.list_countries()

# Get disease outbreaks
ebola = africa_cdc.get_disease_outbreaks(
    disease='EBOLA',
    countries=['CD', 'UG', 'GN']
)

# Vaccination coverage
vax = africa_cdc.get_vaccination_coverage(
    countries=['NG', 'ET', 'ZA'],
    vaccines=['COVID-19', 'Measles']
)

Example 7: RKI Germany Surveillance

from epidatasets import get_source

rki = get_source("rki")

# COVID-19 nowcasting with R estimates
nowcast = rki.get_covid_nowcast(
    date_range=('2022-01-01', '2022-06-30')
)

# Influenza surveillance
flu = rki.get_influenza_data(seasons=['2022/23', '2023/24'])

Example 8: Multi-source Comparison

from epidatasets import get_source, list_sources

# See all available sources
print(list_sources().keys())

# Compare data across sources
who = get_source("who")
owid = get_source("owid")

who_malaria = who.get_indicator(
    indicator="MALARIA_EST_INCIDENCE",
    years=[2022],
    countries=["BRA"]
)

owid_covid = owid.get_covid_data(
    countries=["BRA"],
    metrics=["cases", "deaths"],
    start_date='2022-01-01',
    end_date='2022-12-31'
)

๐Ÿ“Š Available Sources

Source Name Class Extra Description
africa_cdc AfricaCDCAccessor โ€” Africa CDC public health data (55 AU states)
cdc_opendata CDCOpenDataAccessor โ€” US CDC Open Data portal
china_cdc ChinaCDCAccessor โ€” China CDC Weekly surveillance
colombia_ins ColombiaINSAccessor โ€” Colombia INS/SIVIGILA surveillance
copernicus_cds CopernicusCDSAccessor [climate] Copernicus Climate Data Store
datasus DataSUSAccessor [brazil] Brazilian DATASUS/SINAN (via PySUS)
ecdc ECDCOpenDataAccessor โ€” ECDC infectious disease data
epipulse EpiPulseAccessor โ€” ECDC EpiPulse surveillance portal
eurostat EurostatAccessor [eurostat] EU health statistics
global_health GlobalHealthAccessor โ€” Global.health pandemic linelist data
healthdata_gov HealthDataGovAccessor โ€” US HealthData.gov
india_idsp IndiaIDSPAccessor โ€” India IDSP disease surveillance
infodengue InfoDengueAPI โ€” InfoDengue dengue surveillance (Brazil)
malaria_atlas MalariaAtlasAccessor โ€” Malaria Atlas Project data
owid OWIDAccessor โ€” Our World in Data (COVID-19, vaccination)
paho PAHOAccessor โ€” PAHO Pan-American health data
pathoplexus PathoplexusAccessor [genomics] Pathoplexus pathogen genomic data
respicast RespiCastAccessor โ€” ECDC respiratory disease forecasting
rki RKIGermanyAccessor โ€” Robert Koch Institute (Germany)
ukhsa UKHSAAccessor โ€” UK Health Security Agency
who WHOAccessor [who] WHO Global Health Observatory

โ“ FAQ

What is epidatasets?

A Python library providing a unified interface to 21 epidemiological data sources worldwide, installable via pip install epidatasets.

Do I need to install all optional dependencies?

No. The base install covers most sources. Only install extras for sources that need them (e.g., pip install epidatasets[who] for WHO GHO data, pip install epidatasets[brazil] for DATASUS).

How do I discover available sources?

from epidatasets import list_sources
print(list_sources())

Or from the CLI: epidatasets sources

Are all dataset accessors fully implemented?

Most accessors provide working data retrieval. Some are structured placeholders for sources that require registration or have limited public APIs. Check the documentation for each source's status.

Can I contribute a new data source?

Yes! Sources are registered via entry_points in pyproject.toml. See CONTRIBUTING.md for guidelines on adding new accessors.

๐Ÿค Contributing

We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.

Quick Links for Contributors

Start Contributing

Priority Contributions

  1. New data source accessors - Especially from underrepresented regions
  2. Example notebooks - Jupyter notebooks demonstrating data analysis
  3. Documentation - Translations, improvements, and API docs
  4. Bug fixes - Check the issue tracker

Badges for Contributors

Good First Issues Help Wanted

๐Ÿ“š Related Projects

Project Description Repository
PySUS Brazilian health data (DATASUS) AlertaDengue/PySUS
ghoclient WHO Global Health Observatory fccoelho/ghoclient
epigrass Epidemic simulation EpiGrass/epigrass
epimodels Mathematical epidemiology fccoelho/epimodels

๐Ÿ“Š Statistics

  • Data sources: 21 registered (via plugin registry)
  • Countries covered: 100+
  • Optional extras: 10 (who, brazil, eurostat, climate, geo, viz, genomics, cli, worldbank, search)
  • Example notebooks: 20+
  • Documentation: epidatasets.readthedocs.io

๐Ÿ“š Citation

If you use this package in your research, please cite:

@misc{fccoelho_epidatasets,
  author = {Coelho, Flรกvio Codeรงo},
  title = {Epidatasets: Python Access to Epidemiological Datasets Worldwide},
  year = {2026},
  publisher = {GitHub},
  journal = {GitHub Repository},
  howpublished = {\url{https://github.com/fccoelho/epidemiological-datasets}}
}

For PySUS:

@software{pysus,
  author = {AlertaDengue Team},
  title = {PySUS: Tools for Brazilian Public Health Data},
  url = {https://github.com/AlertaDengue/PySUS}
}

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ’œ Sponsor

This project is sponsored by

Kwar-AI

Kwar-AI โ€” Intelligence for Epidemiology

AI-powered solutions for disease surveillance and outbreak prediction


๐Ÿ™ Acknowledgments

  • PySUS Contributors - For making Brazilian health data accessible
  • WHO - For maintaining the Global Health Observatory
  • All data providers who make epidemiological data openly accessible
  • Global public health community

๐Ÿ“ž Contact


Made with โค๏ธ for the epidemiological research community

๐Ÿ› Report Bug โ€ข ๐Ÿ’ก Request Feature โ€ข ๐Ÿ’ฌ Discussions


Disclaimer: This repository is a community effort to catalog open data sources. Please always refer to the original data providers for official statistics and verify data usage terms. The maintainers are not responsible for data quality or availability.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epidatasets-0.3.1.tar.gz (4.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epidatasets-0.3.1-py3-none-any.whl (133.6 kB view details)

Uploaded Python 3

File details

Details for the file epidatasets-0.3.1.tar.gz.

File metadata

  • Download URL: epidatasets-0.3.1.tar.gz
  • Upload date:
  • Size: 4.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epidatasets-0.3.1.tar.gz
Algorithm Hash digest
SHA256 c199b4068ceea38f9ef46d2281bebbb5254448acaea4ad46f7858fd3894c749f
MD5 743691f6bb0a85006bae796e195b6b0b
BLAKE2b-256 5febd11f7db8f920ea3a2fb3ad7c0f9df762498bf3a030049414faf569c724ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for epidatasets-0.3.1.tar.gz:

Publisher: python-publish.yml on fccoelho/epidemiological-datasets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epidatasets-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: epidatasets-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 133.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epidatasets-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d7d20d1dbf6c637fe5454f851e87135ef0e3788a44225e12e4fc9ac63a377121
MD5 39ef2fbbc34fc1cb42375fb86359aa4b
BLAKE2b-256 0cb1ff82ba968e6cd4b1a37e87c6b1def27e1054fa0f63f4c069cf1eb8437f26

See more details on using hashes here.

Provenance

The following attestation bundles were made for epidatasets-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on fccoelho/epidemiological-datasets

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page