Python library for accessing epidemiological datasets worldwide
Project description
๐ Epidatasets
Sponsored by
AI-powered epidemiological intelligence
A Python library providing unified access to 21 epidemiological data sources from around the world, with a plugin registry, CLI, and optional extras for specialized data.
๐ Table of Contents
- Overview
- Installation
- Quick Start
- Repository Structure
- Available Datasets
- CLI Usage
- Usage Examples
- Available Sources
- FAQ
- Contributing
- Related Projects
- Citation
- License
๐ฏ Overview
epidatasets provides:
- Unified interface โ A single
get_source()API to access 21 data sources worldwide - Plugin registry โ Sources are discovered at runtime via
entry_points, making it easy to extend - Optional extras โ Install only the dependencies you need (
pip install epidatasets[who,brazil]) - CLI โ Command-line tool for listing sources, inspecting metadata, and querying countries
- Caching & rate limiting โ Built-in utilities for responsible API usage
- Reproducible research โ Standardized access to heterogeneous epidemiological datasets
๐ฆ Installation
From PyPI
pip install epidatasets
With optional extras
# WHO Global Health Observatory data
pip install epidatasets[who]
# Brazilian DATASUS/SINAN data via PySUS
pip install epidatasets[brazil]
# Eurostat EU health statistics
pip install epidatasets[eurostat]
# Climate/environmental data (Copernicus CDS)
pip install epidatasets[climate]
# Geospatial visualization
pip install epidatasets[geo]
# Plotting & visualization
pip install epidatasets[viz]
# Genomic data (Pathoplexus)
pip install epidatasets[genomics]
# CLI support
pip install epidatasets[cli]
# World Bank indicators
pip install epidatasets[worldbank]
# Install everything
pip install epidatasets[all]
Development installation
git clone https://github.com/fccoelho/epidemiological-datasets.git
cd epidemiological-datasets
pip install -e ".[dev,docs]"
๐ Quick Start
from epidatasets import get_source, list_sources
# Discover available sources
sources = list_sources()
for name, meta in sorted(sources.items()):
print(f"{name}: {meta['description']}")
# Get a specific source
paho = get_source("paho")
countries = paho.list_countries()
print(f"PAHO covers {len(countries)} countries")
# Get WHO data (requires: pip install epidatasets[who])
who = get_source("who")
malaria = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2020, 2021, 2022],
countries=["BRA", "IND", "NGA"]
)
# Get OWID COVID-19 data
owid = get_source("owid")
covid = owid.get_covid_data(
countries=["BRA", "USA", "IND"],
metrics=["cases", "deaths"]
)
๐ Repository Structure
epidemiological-datasets/
โโโ src/epidatasets/ # Main Python package
โ โโโ __init__.py # Public API (get_source, list_sources)
โ โโโ _base.py # BaseAccessor ABC
โ โโโ _registry.py # Plugin registry (entry_points)
โ โโโ cli.py # CLI (typer)
โ โโโ sources/ # 21 data source accessors
โ โ โโโ __init__.py
โ โ โโโ africa_cdc.py
โ โ โโโ cdc_opendata.py
โ โ โโโ china_cdc.py
โ โ โโโ colombia_ins.py
โ โ โโโ copernicus_cds.py
โ โ โโโ datasus_pysus.py
โ โ โโโ ecdc_opendata.py
โ โ โโโ epipulse.py
โ โ โโโ eurostat.py
โ โ โโโ global_health.py
โ โ โโโ healthdata_gov.py
โ โ โโโ india_idsp.py
โ โ โโโ infodengue_api.py
โ โ โโโ malaria_atlas.py
โ โ โโโ owid.py
โ โ โโโ paho.py
โ โ โโโ pathoplexus.py
โ โ โโโ respicast.py
โ โ โโโ rki_germany.py
โ โ โโโ ukhsa.py
โ โ โโโ who_ghoclient.py
โ โโโ utils/ # Utilities
โ โโโ cache.py # Caching layer
โ โโโ rate_limit.py # API rate limiting
โ โโโ geo.py # Geospatial helpers
โ โโโ validation.py # Data validation
โ โโโ io.py # I/O utilities
โโโ tests/ # Test suite
โ โโโ sources/
โ โโโ utils/
โ โโโ conftest.py
โ โโโ ...
โโโ docs/ # MkDocs documentation
โ โโโ mkdocs.yml
โ โโโ docs/
โ โโโ index.md
โ โโโ installation.md
โ โโโ quickstart.md
โ โโโ sources/ # Per-source API docs (21 pages)
โ โโโ api/ # API reference
โ โ โโโ base.md
โ โ โโโ registry.md
โ โ โโโ cli.md
โ โ โโโ utils.md
โ โโโ examples/ # Jupyter notebooks
โโโ mkdocs.yml # Docs config
โโโ .readthedocs.yaml # ReadTheDocs config
โโโ pyproject.toml # Package configuration
โโโ README.md
๐ Available Datasets
Global ๐
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| WHO Global Health Observatory | Health indicators by country | Annual | Open | epidatasets.sources.who_ghoclient |
| Our World in Data - Health | COVID-19, vaccination, excess mortality | Daily/Weekly | Open | epidatasets.sources.owid |
| Global Health Data Exchange (GHDx) | Catalog of health datasets | Varies | Varies | Catalog only |
| HDX (Humanitarian Data Exchange) | Health in crisis contexts | Real-time | Open | Planned |
| Global.health | Pandemic linelist data | Varies | Open | epidatasets.sources.global_health |
| Malaria Atlas Project | Malaria prevalence & vector data | Annual | Open | epidatasets.sources.malaria_atlas |
| Copernicus Climate Data Store | Environmental & climate data | Varies | Open | epidatasets.sources.copernicus_cds |
| Pathoplexus | Pathogen genomic data | Continuous | Open | epidatasets.sources.pathoplexus |
| InfoDengue | Dengue surveillance (Brazil) | Weekly | Open | epidatasets.sources.infodengue_api |
North America ๐บ๐ธ๐จ๐ฆ๐ฒ๐ฝ
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| CDC Open Data | CDC datasets portal (COVID-19, Influenza, NNDSS, CDI) | Varies | Open | epidatasets.sources.cdc_opendata |
| HealthData.gov | US health system data | Weekly | Open | epidatasets.sources.healthdata_gov |
| Statistics Canada - Health | Canadian health data | Quarterly | Open | Planned |
South America ๐
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| SINAN / DATASUS - Brazil | Brazilian notifiable diseases & health system data | Weekly | Open* | epidatasets.sources.datasus_pysus |
| PAHO/WHO Regional Data | Pan-American health data | Monthly | Open | epidatasets.sources.paho |
| Chile DEIS | Chilean health statistics | Monthly | Open | Planned |
| Colombia INS | Colombian public health data (SIVIGILA) | Weekly | Open | epidatasets.sources.colombia_ins |
*Note: DATASUS access requires
pip install epidatasets[brazil](installs PySUS).
Europe ๐ช๐บ
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| ECDC EpiPulse | European surveillance portal (53 countries, 50+ diseases) | Daily/Weekly | Registration | epidatasets.sources.epipulse |
| ECDC Open Data | Infectious disease surveillance (50+ diseases, 30 countries) | Weekly | Open | epidatasets.sources.ecdc_opendata |
| ECDC RespiCast | Respiratory disease forecasting hub | Weekly | Open | epidatasets.sources.respicast |
| Eurostat Health | EU health statistics | Annual | Open | epidatasets.sources.eurostat |
| UK Health Security Agency | UK health data | Weekly | Open | epidatasets.sources.ukhsa |
| Robert Koch Institute | German surveillance data | Weekly | Open | epidatasets.sources.rki_germany |
Africa ๐
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| WHO Afro Health Observatory | African region health data | Annual | Open | epidatasets.sources.who_ghoclient |
| Africa CDC | African public health data (55 AU member states) | Weekly | Open | epidatasets.sources.africa_cdc |
Asia ๐
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| China CDC Weekly | Chinese surveillance data | Weekly | Open | epidatasets.sources.china_cdc |
| IDSP India | Indian disease surveillance | Weekly | Open* | epidatasets.sources.india_idsp |
| NIID Japan | Japanese infectious disease data | Weekly | Open | Planned |
| Korea CDC | Korean disease control data | Weekly | Open | Planned |
Oceania ๐ฆ๐บ๐ณ๐ฟ
| Dataset | Description | Update Frequency | Access Level | Module |
|---|---|---|---|---|
| Australian Institute of Health and Welfare | Australian health data | Annual | Open | Planned |
| NZ Ministry of Health | New Zealand health statistics | Annual | Open | Planned |
๐ป CLI Usage
The epidatasets CLI provides quick access from the terminal (requires pip install epidatasets[cli]):
# List all available data sources
epidatasets sources
# Show detailed info about a source
epidatasets info who
# List countries covered by a source
epidatasets countries paho
๐ก Usage Examples
Example 1: WHO Global Health Data
from epidatasets import get_source
who = get_source("who")
# Get malaria incidence data
data = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2020, 2021, 2022],
countries=["BRA", "IND", "NGA"]
)
print(data.head())
Example 2: PAHO Pan-American Health Data
from epidatasets import get_source
paho = get_source("paho")
# List member countries
countries = paho.list_countries()
print(f"Total countries: {len(countries)}")
# Get immunization coverage
coverage = paho.get_immunization_coverage(
vaccines=['DTP3', 'MCV1'],
subregion='Southern Cone',
years=[2020, 2021, 2022]
)
# Compare health indicators
comparison = paho.compare_countries(
indicator='LIFE_EXPECTANCY',
countries=['BRA', 'MEX', 'ARG', 'COL'],
years=[2019, 2020, 2021]
)
Example 3: Eurostat EU Health Statistics
from epidatasets import get_source
eurostat = get_source("eurostat")
# Healthcare expenditure
expenditure = eurostat.get_healthcare_expenditure(
countries=['DEU', 'FRA', 'ITA'],
years=list(range(2015, 2024))
)
# Mortality data by cause
mortality = eurostat.get_mortality_data(
cause_code='COVID-19',
countries=['DEU', 'FRA', 'ITA'],
years=[2020, 2021, 2022]
)
# Life expectancy comparison
life_exp = eurostat.get_life_expectancy(
countries=['DEU', 'FRA', 'ITA', 'ESP'],
years=[2019, 2020, 2021]
)
Example 4: Our World in Data
from epidatasets import get_source
owid = get_source("owid")
# COVID-19 data for specific countries
covid = owid.get_covid_data(
countries=['BRA', 'USA', 'IND'],
metrics=['cases', 'deaths', 'hospitalizations'],
start_date='2021-01-01',
end_date='2021-12-31'
)
# Excess mortality estimates
excess = owid.get_excess_mortality(
countries=['GBR', 'ITA', 'USA'],
start_date='2020-03-01'
)
# Global summary
summary = owid.get_global_summary()
Example 5: Brazil DATASUS via PySUS
from epidatasets import get_source
datasus = get_source("datasus")
# Access Brazilian notifiable disease data
dengue = datasus.download(
disease="Dengue",
years=[2022, 2023],
states=["RJ", "SP", "MG"]
)
Example 6: Africa CDC Data
from epidatasets import get_source
africa_cdc = get_source("africa_cdc")
# List all 55 African Union member states
countries = africa_cdc.list_countries()
# Get disease outbreaks
ebola = africa_cdc.get_disease_outbreaks(
disease='EBOLA',
countries=['CD', 'UG', 'GN']
)
# Vaccination coverage
vax = africa_cdc.get_vaccination_coverage(
countries=['NG', 'ET', 'ZA'],
vaccines=['COVID-19', 'Measles']
)
Example 7: RKI Germany Surveillance
from epidatasets import get_source
rki = get_source("rki")
# COVID-19 nowcasting with R estimates
nowcast = rki.get_covid_nowcast(
date_range=('2022-01-01', '2022-06-30')
)
# Influenza surveillance
flu = rki.get_influenza_data(seasons=['2022/23', '2023/24'])
Example 8: Multi-source Comparison
from epidatasets import get_source, list_sources
# See all available sources
print(list_sources().keys())
# Compare data across sources
who = get_source("who")
owid = get_source("owid")
who_malaria = who.get_indicator(
indicator="MALARIA_EST_INCIDENCE",
years=[2022],
countries=["BRA"]
)
owid_covid = owid.get_covid_data(
countries=["BRA"],
metrics=["cases", "deaths"],
start_date='2022-01-01',
end_date='2022-12-31'
)
๐ Available Sources
| Source Name | Class | Extra | Description |
|---|---|---|---|
africa_cdc |
AfricaCDCAccessor |
โ | Africa CDC public health data (55 AU states) |
cdc_opendata |
CDCOpenDataAccessor |
โ | US CDC Open Data portal |
china_cdc |
ChinaCDCAccessor |
โ | China CDC Weekly surveillance |
colombia_ins |
ColombiaINSAccessor |
โ | Colombia INS/SIVIGILA surveillance |
copernicus_cds |
CopernicusCDSAccessor |
[climate] |
Copernicus Climate Data Store |
datasus |
DataSUSAccessor |
[brazil] |
Brazilian DATASUS/SINAN (via PySUS) |
ecdc |
ECDCOpenDataAccessor |
โ | ECDC infectious disease data |
epipulse |
EpiPulseAccessor |
โ | ECDC EpiPulse surveillance portal |
eurostat |
EurostatAccessor |
[eurostat] |
EU health statistics |
global_health |
GlobalHealthAccessor |
โ | Global.health pandemic linelist data |
healthdata_gov |
HealthDataGovAccessor |
โ | US HealthData.gov |
india_idsp |
IndiaIDSPAccessor |
โ | India IDSP disease surveillance |
infodengue |
InfoDengueAPI |
โ | InfoDengue dengue surveillance (Brazil) |
malaria_atlas |
MalariaAtlasAccessor |
โ | Malaria Atlas Project data |
owid |
OWIDAccessor |
โ | Our World in Data (COVID-19, vaccination) |
paho |
PAHOAccessor |
โ | PAHO Pan-American health data |
pathoplexus |
PathoplexusAccessor |
[genomics] |
Pathoplexus pathogen genomic data |
respicast |
RespiCastAccessor |
โ | ECDC respiratory disease forecasting |
rki |
RKIGermanyAccessor |
โ | Robert Koch Institute (Germany) |
ukhsa |
UKHSAAccessor |
โ | UK Health Security Agency |
who |
WHOAccessor |
[who] |
WHO Global Health Observatory |
โ FAQ
What is epidatasets?
A Python library providing a unified interface to 21 epidemiological data sources worldwide, installable via pip install epidatasets.
Do I need to install all optional dependencies?
No. The base install covers most sources. Only install extras for sources that need them (e.g., pip install epidatasets[who] for WHO GHO data, pip install epidatasets[brazil] for DATASUS).
How do I discover available sources?
from epidatasets import list_sources
print(list_sources())
Or from the CLI: epidatasets sources
Are all dataset accessors fully implemented?
Most accessors provide working data retrieval. Some are structured placeholders for sources that require registration or have limited public APIs. Check the documentation for each source's status.
Can I contribute a new data source?
Yes! Sources are registered via entry_points in pyproject.toml. See CONTRIBUTING.md for guidelines on adding new accessors.
๐ค Contributing
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
Quick Links for Contributors
- ๐ Contributing Guide - How to get started
- ๐ Report a Bug
- ๐ก Request a Feature
- ๐ Request a Data Source
- ๐ฌ GitHub Discussions - Ask questions, share ideas
Priority Contributions
- New data source accessors - Especially from underrepresented regions
- Example notebooks - Jupyter notebooks demonstrating data analysis
- Documentation - Translations, improvements, and API docs
- Bug fixes - Check the issue tracker
Badges for Contributors
๐ Related Projects
| Project | Description | Repository |
|---|---|---|
| PySUS | Brazilian health data (DATASUS) | AlertaDengue/PySUS |
| ghoclient | WHO Global Health Observatory | fccoelho/ghoclient |
| epigrass | Epidemic simulation | EpiGrass/epigrass |
| epimodels | Mathematical epidemiology | fccoelho/epimodels |
๐ Statistics
- Data sources: 21 registered (via plugin registry)
- Countries covered: 100+
- Optional extras: 10 (
who,brazil,eurostat,climate,geo,viz,genomics,cli,worldbank,search) - Example notebooks: 20+
- Documentation: epidatasets.readthedocs.io
๐ Citation
If you use this package in your research, please cite:
@misc{fccoelho_epidatasets,
author = {Coelho, Flรกvio Codeรงo},
title = {Epidatasets: Python Access to Epidemiological Datasets Worldwide},
year = {2026},
publisher = {GitHub},
journal = {GitHub Repository},
howpublished = {\url{https://github.com/fccoelho/epidemiological-datasets}}
}
For PySUS:
@software{pysus,
author = {AlertaDengue Team},
title = {PySUS: Tools for Brazilian Public Health Data},
url = {https://github.com/AlertaDengue/PySUS}
}
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Sponsor
This project is sponsored by
Kwar-AI โ Intelligence for Epidemiology
AI-powered solutions for disease surveillance and outbreak prediction
๐ Acknowledgments
- PySUS Contributors - For making Brazilian health data accessible
- WHO - For maintaining the Global Health Observatory
- All data providers who make epidemiological data openly accessible
- Global public health community
๐ Contact
- Author: Flรกvio Codeรงo Coelho (@fccoelho)
- Website: https://fccoelho.github.io/
- Documentation: https://epidatasets.readthedocs.io
Made with โค๏ธ for the epidemiological research community
๐ Report Bug โข ๐ก Request Feature โข ๐ฌ Discussions
Disclaimer: This repository is a community effort to catalog open data sources. Please always refer to the original data providers for official statistics and verify data usage terms. The maintainers are not responsible for data quality or availability.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epidatasets-0.3.1.tar.gz.
File metadata
- Download URL: epidatasets-0.3.1.tar.gz
- Upload date:
- Size: 4.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c199b4068ceea38f9ef46d2281bebbb5254448acaea4ad46f7858fd3894c749f
|
|
| MD5 |
743691f6bb0a85006bae796e195b6b0b
|
|
| BLAKE2b-256 |
5febd11f7db8f920ea3a2fb3ad7c0f9df762498bf3a030049414faf569c724ec
|
Provenance
The following attestation bundles were made for epidatasets-0.3.1.tar.gz:
Publisher:
python-publish.yml on fccoelho/epidemiological-datasets
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epidatasets-0.3.1.tar.gz -
Subject digest:
c199b4068ceea38f9ef46d2281bebbb5254448acaea4ad46f7858fd3894c749f - Sigstore transparency entry: 1343162513
- Sigstore integration time:
-
Permalink:
fccoelho/epidemiological-datasets@b2dd8eb2413cbd38bdcfd1a904a3bbd3147c209d -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fccoelho
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b2dd8eb2413cbd38bdcfd1a904a3bbd3147c209d -
Trigger Event:
release
-
Statement type:
File details
Details for the file epidatasets-0.3.1-py3-none-any.whl.
File metadata
- Download URL: epidatasets-0.3.1-py3-none-any.whl
- Upload date:
- Size: 133.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d7d20d1dbf6c637fe5454f851e87135ef0e3788a44225e12e4fc9ac63a377121
|
|
| MD5 |
39ef2fbbc34fc1cb42375fb86359aa4b
|
|
| BLAKE2b-256 |
0cb1ff82ba968e6cd4b1a37e87c6b1def27e1054fa0f63f4c069cf1eb8437f26
|
Provenance
The following attestation bundles were made for epidatasets-0.3.1-py3-none-any.whl:
Publisher:
python-publish.yml on fccoelho/epidemiological-datasets
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epidatasets-0.3.1-py3-none-any.whl -
Subject digest:
d7d20d1dbf6c637fe5454f851e87135ef0e3788a44225e12e4fc9ac63a377121 - Sigstore transparency entry: 1343162523
- Sigstore integration time:
-
Permalink:
fccoelho/epidemiological-datasets@b2dd8eb2413cbd38bdcfd1a904a3bbd3147c209d -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/fccoelho
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b2dd8eb2413cbd38bdcfd1a904a3bbd3147c209d -
Trigger Event:
release
-
Statement type: