Japanese IDWR infectious disease database and analytics toolkit built on Polars.
Project description
jp-idwr-db
Python access to Japanese infectious disease surveillance data from NIID/JIHS.
jp-idwr-db provides a Polars-first API for filtering and analysis.
Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use.
It is inspired by the R package jpinfect, but it is not an API-parity port and includes independently curated ingestion and coverage.
Install
pip install jp-idwr-db
Data Download Model
- Package wheels do not ship the large parquet tables.
- On first call to
jp.load(...)(orjp.get_data(...)), the package downloads versioned data assets from GitHub Releases. - Cache path defaults to:
- macOS:
~/Library/Caches/jp_idwr_db/data/<version>/ - Linux:
~/.cache/jp_idwr_db/data/<version>/ - Windows:
%LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\
- macOS:
Prefetch explicitly:
python -m jp_idwr_db data download
python -m jp_idwr_db data download --version v0.1.0 --force
Environment overrides:
JPINFECT_DATA_VERSION: choose a specific release tag (example:v0.1.0)JPINFECT_DATA_BASE_URL: override asset host base URLJPINFECT_CACHE_DIR: override local cache root
Quick Start
import jp_idwr_db as jp
# Full unified dataset (recommended)
df = jp.load("unified")
print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
│ Tochigi ┆ Lyme disease ┆ 2011 ┆ 24 ┆ 0.0 ┆ Confirmed cases │
│ Kochi ┆ Avian influenza H5N1 ┆ 2008 ┆ 51 ┆ 0.0 ┆ Confirmed cases │
│ Hokkaido ┆ Dengue fever ┆ 1999 ┆ 28 ┆ 0.0 ┆ Confirmed cases │
│ Tokyo ┆ Congenital rubella syndrome ┆ 2014 ┆ 41 ┆ 0.0 ┆ Confirmed cases │
│ Nagasaki ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4 ┆ 0.0 ┆ Confirmed cases │
│ Fukushima ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25 ┆ 145.0 ┆ Sentinel surveillance │
│ Nara ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10 ┆ 0.0 ┆ Confirmed cases │
│ Mie ┆ Plague ┆ 2006 ┆ 37 ┆ 0.0 ┆ Confirmed cases │
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
import jp_idwr_db as jp
# Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
shape: (5, 2)
┌────────────┬───────────────┐
│ prefecture ┆ prefecture_id │
╞════════════╪═══════════════╡
│ Tochigi ┆ JP-09 │
│ Kochi ┆ JP-39 │
│ Hokkaido ┆ JP-01 │
│ Tokyo ┆ JP-13 │
│ Nagasaki ┆ JP-42 │
└────────────┴───────────────┘
Main API
Top-level API exported by jp_idwr_db:
load(name)get_data(...)list_diseases(source="all")list_prefectures()get_latest_week()prefecture_map()attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")merge(...),pivot(...)configure(...),get_config()
Filtered Access with get_data
import jp_idwr_db as jp
# Tuberculosis rows for a year range
tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
│ Hokkaido ┆ Tuberculosis ┆ 2020 ┆ 12 ┆ 5.0 ┆ Confirmed cases │
│ Oita ┆ Tuberculosis ┆ 2023 ┆ 38 ┆ 6.0 ┆ Confirmed cases │
│ Fukuoka ┆ Tuberculosis ┆ 2021 ┆ 8 ┆ 12.0 ┆ Confirmed cases │
│ Kagawa ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 2.0 ┆ Confirmed cases │
│ Chiba ┆ Tuberculosis ┆ 2020 ┆ 19 ┆ 9.0 ┆ Confirmed cases │
│ Kanagawa ┆ Tuberculosis ┆ 2022 ┆ 17 ┆ 25.0 ┆ Confirmed cases │
│ Okinawa ┆ Tuberculosis ┆ 2021 ┆ 11 ┆ 4.0 ┆ Confirmed cases │
│ Gifu ┆ Tuberculosis ┆ 2018 ┆ 23 ┆ 7.0 ┆ Confirmed cases │
└────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
import jp_idwr_db as jp
# Sentinel-only diseases from recent years
sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
│ prefecture ┆ disease ┆ year ┆ week ┆ count ┆ source │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ str ┆ i32 ┆ i32 ┆ f64 ┆ str │
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
│ Ishikawa ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42 ┆ 813.0 ┆ Sentinel surveillance │
│ Nara ┆ Erythema infection ┆ 2025 ┆ 31 ┆ 823.0 ┆ Sentinel surveillance │
│ Saga ┆ Mumps ┆ 2024 ┆ 26 ┆ 14.0 ┆ Sentinel surveillance │
│ Hyogo ┆ Pharyngoconjunctival fever ┆ 2023 ┆ 19 ┆ 468.0 ┆ Sentinel surveillance │
│ Miyazaki ┆ Infectious gastroenteritis ┆ 2026 ┆ 3 ┆ 339.0 ┆ Sentinel surveillance │
│ Kagoshima ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9 ┆ null ┆ Sentinel surveillance │
│ Osaka ┆ Mumps ┆ 2024 ┆ 49 ┆ 404.0 ┆ Sentinel surveillance │
│ Aomori ┆ Erythema infection ┆ 2024 ┆ 10 ┆ 5.0 ┆ Sentinel surveillance │
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
Datasets
Use jp.load(...) with:
"sex": historical sex-disaggregated surveillance"place": historical place-category surveillance"bullet": modern all-case weekly reports (rapid zensu)"sentinel": sentinel weekly reports (teitenrui; 2012+ in release data assets)"unified": deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)
Detailed schema and coverage are documented in DATASETS.md.
Raw Download and Parsing
Raw file workflows are available in jp_idwr_db.io:
jp_idwr_db.io.download(...)jp_idwr_db.io.download_recent(...)jp_idwr_db.io.read(...)
These are useful for refreshing local raw weekly files or debugging parser behavior.
Data Wrangling Examples
See EXAMPLES.md for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).
Disease-by-disease temporal coverage is documented in DISEASES.md.
Data Source
NIID/JIHS infectious disease surveillance publications:
- Historical annual archive files (
Syu_01_1,Syu_02_1) - Rapid weekly CSV reports (
zensuXX.csv,teitenruiXX.csv)
Development
uv sync --all-extras --dev
uv run ruff check .
uv run mypy src
uv run pytest
Security and Integrity
- Release assets include a
jp_idwr_db-manifest.jsonwith SHA256 checksums. ensure_data()verifies archive checksum and each extracted parquet checksum before marking cache complete.- For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.
License
GPL-3.0-or-later. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jp_idwr_db-0.2.2.tar.gz.
File metadata
- Download URL: jp_idwr_db-0.2.2.tar.gz
- Upload date:
- Size: 41.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a7bbab443569cd2ebf7f229d8e304936bf61a12302d127187d9fe8b19a6ccb3
|
|
| MD5 |
1cd07371297d71dc65b3b98ee9a54d18
|
|
| BLAKE2b-256 |
554647c417a3cb31af37592e99dffb4e5531cedcfda9fab5cb36be0345c6e9b7
|
File details
Details for the file jp_idwr_db-0.2.2-py3-none-any.whl.
File metadata
- Download URL: jp_idwr_db-0.2.2-py3-none-any.whl
- Upload date:
- Size: 46.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f20faf7be321f0587eeea1cc1a674113aa035e02d8fb951d6401ee75c6f6a8d2
|
|
| MD5 |
f96e363d7641d6cbd7449c4893c219b4
|
|
| BLAKE2b-256 |
338729f8fcca58c9b8b1df674906fc4f4d7d32215662ec6a440010e843f441b7
|