Japanese IDWR infectious disease database and analytics toolkit built on Polars.

These details have not been verified by PyPI

Project links

Project description

jp-idwr-db

Python access to Japanese infectious disease surveillance data from NIID/JIHS.

jp-idwr-db provides a Polars-first API for filtering and analysis. Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use. It is inspired by the R package jpinfect, but it is not an API-parity port and includes independently curated ingestion and coverage.

NIID/JIHS surveillance data is public, but it is not exposed as a clean analytical API. To reconstruct usable time series, you typically need to navigate multiple archive structures, yearly directories, and week-level files with changing formats (Excel and CSV) across historical and modern reporting systems.

This package exists to remove that friction: it consolidates those heterogeneous sources into standardized, queryable tables so you can move directly to epidemiological analysis instead of file discovery, parsing, and schema harmonization.

Install

pip install jp-idwr-db

Data Download Model

Package wheels do not ship the large parquet tables.
On first call to jp.load(...) (or jp.get_data(...)), the package downloads versioned data assets from GitHub Releases.
Cache path defaults to:
- macOS: ~/Library/Caches/jp_idwr_db/data/<version>/
- Linux: ~/.cache/jp_idwr_db/data/<version>/
- Windows: %LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\

Prefetch explicitly:

python -m jp_idwr_db data download
python -m jp_idwr_db data download --version v0.2.2 --force

Environment overrides:

JPINFECT_DATA_VERSION: choose a specific release tag (example: v0.2.2)
JPINFECT_DATA_BASE_URL: override asset host base URL
JPINFECT_CACHE_DIR: override local cache root

Quick Start

To fetch the full unified dataset with a single call:

import jp_idwr_db as jp
import polars as pl

df = (
    jp.load("unified")
    .select(["date", "prefecture", "category", "disease", "count", "source"])
)
print(df)

shape: (5_370_477, 6)
┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
│ date       ┆ prefecture ┆ category ┆ disease                     ┆ count ┆ source             │
│ ---        ┆ ---        ┆ ---      ┆ ---                         ┆ ---   ┆ ---                │
│ date       ┆ str        ┆ str      ┆ str                         ┆ f64   ┆ str                │
╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ AIDS                        ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Acute poliomyelitis         ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Acute viral hepatitis       ┆ 4.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Amebiasis                   ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Anthrax                     ┆ 0.0   ┆ Confirmed cases    │
│ …          ┆ …          ┆ …        ┆ …                           ┆ …     ┆ …                  │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Viral hepatitis(excluding   ┆ 0.0   ┆ All-case reporting │
│            ┆            ┆          ┆ hepa…                       ┆       ┆                    │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ West Nile fever             ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Western equine encephalitis ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Yellow fever                ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Zika virus infection        ┆ 0.0   ┆ All-case reporting │
└────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘

You can also filter at the source with jp.get_data(...):

# Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
tb = (
    jp.get_data(
        disease="Tuberculosis", 
        year=2024, 
        prefecture=["Tokyo", "Osaka", "Hokkaido"])
    .select(["date", "prefecture", "disease", "count", "source"])
)
print(tb)

shape: (156, 5)
┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
│ date       ┆ prefecture ┆ disease      ┆ count ┆ source             │
│ ---        ┆ ---        ┆ ---          ┆ ---   ┆ ---                │
│ date       ┆ str        ┆ str          ┆ f64   ┆ str                │
╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
│ 2024-01-01 ┆ Hokkaido   ┆ Tuberculosis ┆ 2.0   ┆ All-case reporting │
│ 2024-01-01 ┆ Osaka      ┆ Tuberculosis ┆ 3.0   ┆ All-case reporting │
│ 2024-01-01 ┆ Tokyo      ┆ Tuberculosis ┆ 15.0  ┆ All-case reporting │
│ 2024-01-08 ┆ Hokkaido   ┆ Tuberculosis ┆ 4.0   ┆ All-case reporting │
│ 2024-01-08 ┆ Osaka      ┆ Tuberculosis ┆ 17.0  ┆ All-case reporting │
│ …          ┆ …          ┆ …            ┆ …     ┆ …                  │
│ 2024-12-16 ┆ Osaka      ┆ Tuberculosis ┆ 17.0  ┆ All-case reporting │
│ 2024-12-16 ┆ Tokyo      ┆ Tuberculosis ┆ 41.0  ┆ All-case reporting │
│ 2024-12-23 ┆ Hokkaido   ┆ Tuberculosis ┆ 5.0   ┆ All-case reporting │
│ 2024-12-23 ┆ Osaka      ┆ Tuberculosis ┆ 16.0  ┆ All-case reporting │
│ 2024-12-23 ┆ Tokyo      ┆ Tuberculosis ┆ 53.0  ┆ All-case reporting │
└────────────┴────────────┴──────────────┴───────┴────────────────────┘

# Sentinel-only diseases from recent years in Tokyo prefecture
sentinel_df = (
    jp.get_data(
        source="sentinel", 
        year=(2024, 2026))
    .select(["date", "prefecture", "disease", "count", "per_sentinel"])
)
print(sentinel_df)

shape: (2_052, 5)
┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
│ date       ┆ prefecture ┆ disease                         ┆ count   ┆ per_sentinel │
│ ---        ┆ ---        ┆ ---                             ┆ ---     ┆ ---          │
│ date       ┆ str        ┆ str                             ┆ f64     ┆ f64          │
╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
│ 2024-01-07 ┆ Tokyo      ┆ Acute hemorrhagic conjunctivit… ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ Aseptic meningitis              ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ Bacterial meningitis            ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ COVID-19                        ┆ 1365.0  ┆ 3.38         │
│ 2024-01-07 ┆ Tokyo      ┆ Chickenpox                      ┆ 31.0    ┆ 0.12         │
│ …          ┆ …          ┆ …                               ┆ …       ┆ …            │
│ 2026-01-25 ┆ Tokyo      ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07        │
│ 2026-01-25 ┆ Tokyo      ┆ Mumps                           ┆ 30.0    ┆ 0.12         │
│ 2026-01-25 ┆ Tokyo      ┆ Mycoplasma pneumonia            ┆ 32.0    ┆ 1.28         │
│ 2026-01-25 ┆ Tokyo      ┆ Pharyngoconjunctival fever      ┆ 115.0   ┆ 0.47         │
│ 2026-01-25 ┆ Tokyo      ┆ Respiratory syncytial virus in… ┆ 242.0   ┆ 1.0          │
└────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘

Main API

Top-level API exported by jp_idwr_db:

load(name)
get_data(...)
list_diseases(source="all")
list_prefectures()
get_latest_week()
prefecture_map()
attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
merge(...), pivot(...)
configure(...), get_config()

Datasets

Use jp.load(...) with:

"sex": historical sex-disaggregated surveillance
"place": historical place-category surveillance
"bullet": modern all-case weekly reports (rapid zensu)
"sentinel": sentinel reports (teitenrui; 2012+ in release data assets)
"unified": deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)

Note: teitenrui CSVs report year-to-date cumulative counts. jp-idwr-db converts these to weekly incidence (count_t - count_{t-1} within year/prefecture/disease; first week kept as-is).

Detailed schema and coverage are documented in DATASETS.md.

Optional Prefecture IDs

Attach ISO prefecture IDs (JP-01 ... JP-47) only when needed:

import jp_idwr_db as jp

df_with_ids = (
    jp.get_data(disease="Measles", year=2024)
    .select(["prefecture", "disease", "count"])
    .sort(["prefecture", "count"])
    .unique(subset=["prefecture"], keep="first")
    .pipe(jp.attach_prefecture_id)
    .sort("prefecture")
)
print(df_with_ids)

shape: (48, 4)
┌────────────┬─────────┬───────┬───────────────┐
│ prefecture ┆ disease ┆ count ┆ prefecture_id │
│ ---        ┆ ---     ┆ ---   ┆ ---           │
│ str        ┆ str     ┆ f64   ┆ str           │
╞════════════╪═════════╪═══════╪═══════════════╡
│ Aichi      ┆ Measles ┆ 0.0   ┆ JP-23         │
│ Akita      ┆ Measles ┆ 0.0   ┆ JP-05         │
│ Aomori     ┆ Measles ┆ 0.0   ┆ JP-02         │
│ Chiba      ┆ Measles ┆ 0.0   ┆ JP-12         │
│ Ehime      ┆ Measles ┆ 0.0   ┆ JP-38         │
│ …          ┆ …       ┆ …     ┆ …             │
│ Toyama     ┆ Measles ┆ 0.0   ┆ JP-16         │
│ Wakayama   ┆ Measles ┆ 0.0   ┆ JP-30         │
│ Yamagata   ┆ Measles ┆ 0.0   ┆ JP-06         │
│ Yamaguchi  ┆ Measles ┆ 0.0   ┆ JP-35         │
│ Yamanashi  ┆ Measles ┆ 0.0   ┆ JP-19         │
└────────────┴─────────┴───────┴───────────────┘

Raw Download and Parsing

Raw file workflows are available in jp_idwr_db.io:

jp_idwr_db.io.download(...)
jp_idwr_db.io.download_recent(...)
jp_idwr_db.io.read(...)

These are useful for refreshing local raw weekly files or debugging parser behavior.

Data Wrangling Examples

See EXAMPLES.md for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).

Disease-by-disease temporal coverage is documented in DISEASES.md.

Data Source

NIID/JIHS infectious disease surveillance publications:

Historical annual archive files (Syu_01_1, Syu_02_1)
Rapid weekly CSV reports (zensuXX.csv, teitenruiXX.csv)

Development

uv sync --all-extras --dev
uv run ruff check .
uv run mypy src
uv run pytest

Security and Integrity

Release assets include a jp_idwr_db-manifest.json with SHA256 checksums.
ensure_data() verifies archive checksum and each extracted parquet checksum before marking cache complete.
For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.

License

GPL-3.0-or-later. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2026.5.13

May 13, 2026

2026.4.29

Apr 29, 2026

2026.4.15

Apr 15, 2026

2026.4.1

Apr 1, 2026

2026.3.26

Mar 26, 2026

0.2.6

Mar 26, 2026

0.2.5

Feb 7, 2026

This version

0.2.4

Feb 7, 2026

0.2.3

Feb 6, 2026

0.2.2

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jp_idwr_db-0.2.4.tar.gz (42.6 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jp_idwr_db-0.2.4-py3-none-any.whl (47.8 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file jp_idwr_db-0.2.4.tar.gz.

File metadata

Download URL: jp_idwr_db-0.2.4.tar.gz
Upload date: Feb 7, 2026
Size: 42.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`88e6703b134fbd04bac5fa43174e394b7b0403d8307262649c5e5285a6366c14`
MD5	`6a153834172b5aa4285ead5de5ae896b`
BLAKE2b-256	`f430a2bbf36fae4776280f9d12df3c82171c3207ffb5774ecb991faee4a16322`

See more details on using hashes here.

File details

Details for the file jp_idwr_db-0.2.4-py3-none-any.whl.

File metadata

Download URL: jp_idwr_db-0.2.4-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 47.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`795c182442768c6dff9c53efcd30100dfb1b3ff1c08038ddf2ec018a6a1d27dc`
MD5	`570764ad26d02b1f9f7d55714f4e0133`
BLAKE2b-256	`6cc4c5a101a225b5ff183bc229eddab450ed3c81e47c23a05004a7e67e0d4c60`

See more details on using hashes here.

jp-idwr-db 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

jp-idwr-db

Install

Data Download Model

Quick Start

Main API

Datasets

Optional Prefecture IDs

Raw Download and Parsing

Data Wrangling Examples

Data Source

Development

Security and Integrity

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes