Japanese IDWR infectious disease database and analytics toolkit built on Polars.

These details have not been verified by PyPI

Project links

Project description

jp-idwr-db

jp-idwr-db publishes Japan’s infectious disease surveillance data (NIID/JIHS IDWR) as a versioned, language-agnostic data product: Parquet tables plus a machine-readable manifest.json (and an optional DuckDB file with views).

The Python package adds a convenient API and local caching on top of those release assets. Internally, data wrangling is Polars-first for speed and consistent transforms.

The goal is to skip the usual work of chasing week-by-week files across changing archives and formats, so you can get straight to building time series and doing epidemiology instead of spending hours on data munging.

The package provides an easier interface to the data, but you can also query the Parquet files directly with any tool that supports them (DuckDB, Arrow, Spark, etc.) using the manifest.json for file locations and schema. Direct-access examples are included below.

Python Install

pip install jp-idwr-db

Quick Start

To fetch the full unified dataset with a single call:

import jp_idwr_db as jp
import polars as pl

df = (
    jp.load("unified", version="latest")
    .select(["date", "prefecture", "category", "disease", "count", "source"])
)
print(df)

shape: (5_370_477, 6)
┌────────────┬────────────┬──────────┬─────────────────────────────┬───────┬────────────────────┐
│ date       ┆ prefecture ┆ category ┆ disease                     ┆ count ┆ source             │
│ ---        ┆ ---        ┆ ---      ┆ ---                         ┆ ---   ┆ ---                │
│ date       ┆ str        ┆ str      ┆ str                         ┆ f64   ┆ str                │
╞════════════╪════════════╪══════════╪═════════════════════════════╪═══════╪════════════════════╡
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ AIDS                        ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Acute poliomyelitis         ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Acute viral hepatitis       ┆ 4.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Amebiasis                   ┆ 0.0   ┆ Confirmed cases    │
│ 1999-04-11 ┆ Aichi      ┆ total    ┆ Anthrax                     ┆ 0.0   ┆ Confirmed cases    │
│ …          ┆ …          ┆ …        ┆ …                           ┆ …     ┆ …                  │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Viral hepatitis(excluding   ┆ 0.0   ┆ All-case reporting │
│            ┆            ┆          ┆ hepa…                       ┆       ┆                    │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ West Nile fever             ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Western equine encephalitis ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Yellow fever                ┆ 0.0   ┆ All-case reporting │
│ 2026-02-09 ┆ Yamanashi  ┆ total    ┆ Zika virus infection        ┆ 0.0   ┆ All-case reporting │
└────────────┴────────────┴──────────┴─────────────────────────────┴───────┴────────────────────┘

You can also filter at the source with jp.get_data(...):

# Fetch only tuberculosis data for 2024 in Tokyo, Osaka, and Hokkaido
tb = (
    jp.get_data(
        disease="Tuberculosis",
        year=2024,
        prefecture=["Tokyo", "Osaka", "Hokkaido"],
        version="latest")
    .select(["date", "prefecture", "disease", "count", "source"])
)
print(tb)

shape: (156, 5)
┌────────────┬────────────┬──────────────┬───────┬────────────────────┐
│ date       ┆ prefecture ┆ disease      ┆ count ┆ source             │
│ ---        ┆ ---        ┆ ---          ┆ ---   ┆ ---                │
│ date       ┆ str        ┆ str          ┆ f64   ┆ str                │
╞════════════╪════════════╪══════════════╪═══════╪════════════════════╡
│ 2024-01-01 ┆ Hokkaido   ┆ Tuberculosis ┆ 2.0   ┆ All-case reporting │
│ 2024-01-01 ┆ Osaka      ┆ Tuberculosis ┆ 3.0   ┆ All-case reporting │
│ 2024-01-01 ┆ Tokyo      ┆ Tuberculosis ┆ 15.0  ┆ All-case reporting │
│ 2024-01-08 ┆ Hokkaido   ┆ Tuberculosis ┆ 4.0   ┆ All-case reporting │
│ 2024-01-08 ┆ Osaka      ┆ Tuberculosis ┆ 17.0  ┆ All-case reporting │
│ …          ┆ …          ┆ …            ┆ …     ┆ …                  │
│ 2024-12-16 ┆ Osaka      ┆ Tuberculosis ┆ 17.0  ┆ All-case reporting │
│ 2024-12-16 ┆ Tokyo      ┆ Tuberculosis ┆ 41.0  ┆ All-case reporting │
│ 2024-12-23 ┆ Hokkaido   ┆ Tuberculosis ┆ 5.0   ┆ All-case reporting │
│ 2024-12-23 ┆ Osaka      ┆ Tuberculosis ┆ 16.0  ┆ All-case reporting │
│ 2024-12-23 ┆ Tokyo      ┆ Tuberculosis ┆ 53.0  ┆ All-case reporting │
└────────────┴────────────┴──────────────┴───────┴────────────────────┘

# Sentinel-only diseases from recent years in Tokyo prefecture
sentinel_df = (
    jp.get_data(
        source="sentinel",
        prefecture="Tokyo",
        year=(2024, 2026),
        version="latest")
    .select(["date", "prefecture", "disease", "count", "per_sentinel"])
)
print(sentinel_df)

shape: (2_052, 5)
┌────────────┬────────────┬─────────────────────────────────┬─────────┬──────────────┐
│ date       ┆ prefecture ┆ disease                         ┆ count   ┆ per_sentinel │
│ ---        ┆ ---        ┆ ---                             ┆ ---     ┆ ---          │
│ date       ┆ str        ┆ str                             ┆ f64     ┆ f64          │
╞════════════╪════════════╪═════════════════════════════════╪═════════╪══════════════╡
│ 2024-01-07 ┆ Tokyo      ┆ Acute hemorrhagic conjunctivit… ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ Aseptic meningitis              ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ Bacterial meningitis            ┆ null    ┆ null         │
│ 2024-01-07 ┆ Tokyo      ┆ COVID-19                        ┆ 1365.0  ┆ 3.38         │
│ 2024-01-07 ┆ Tokyo      ┆ Chickenpox                      ┆ 31.0    ┆ 0.12         │
│ …          ┆ …          ┆ …                               ┆ …       ┆ …            │
│ 2026-01-25 ┆ Tokyo      ┆ Influenza(excld. avian influen… ┆ 13082.0 ┆ 34.07        │
│ 2026-01-25 ┆ Tokyo      ┆ Mumps                           ┆ 30.0    ┆ 0.12         │
│ 2026-01-25 ┆ Tokyo      ┆ Mycoplasma pneumonia            ┆ 32.0    ┆ 1.28         │
│ 2026-01-25 ┆ Tokyo      ┆ Pharyngoconjunctival fever      ┆ 115.0   ┆ 0.47         │
│ 2026-01-25 ┆ Tokyo      ┆ Respiratory syncytial virus in… ┆ 242.0   ┆ 1.0          │
└────────────┴────────────┴─────────────────────────────────┴─────────┴──────────────┘

Data Download Model

Package wheels do not ship the large parquet tables.
On first call to jp.load(..., version="latest") (or jp.get_data(..., version="latest")), the package downloads parquet assets listed in the latest published release manifest.json.
By default, the package uses the packaged data version that matches the installed wheel. Use version="latest" when you want the freshest published snapshot.
Cache path defaults to:
- macOS: ~/Library/Caches/jp_idwr_db/data/<version>/
- Linux: ~/.cache/jp_idwr_db/data/<version>/
- Windows: %LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\

Prefetch explicitly:

python -m jp_idwr_db data download
python -m jp_idwr_db data download --version latest --force

Environment overrides:

JPINFECT_DATA_VERSION: choose a specific release tag or latest (example: latest)
JPINFECT_DATA_BASE_URL: override asset host base URL
JPINFECT_CACHE_DIR: override local cache root

Language-independent data access

Release data assets are published as:

manifest.json
one or more .parquet tables (including unified.parquet)
optional jp_idwr_db.duckdb (views over the parquet files)

Manifest schema reference: docs/manifest.schema.json.

Fetch the manifest:

curl -L "https://github.com/AlFontal/jp-idwr-db/releases/latest/download/manifest.json"

Query with DuckDB CLI (when jp_idwr_db.duckdb and parquet files are in the same directory):

duckdb jp_idwr_db.duckdb -c "SELECT year, week, COUNT(*) AS rows FROM unified GROUP BY 1,2 ORDER BY 1 DESC, 2 DESC LIMIT 5;"

Download assets for any language

BASE="https://github.com/AlFontal/jp-idwr-db/releases/latest/download"

mkdir -p jp-idwr-assets
cd jp-idwr-assets
curl -L -O "${BASE}/manifest.json"
curl -L -O "${BASE}/unified.parquet"
curl -L -O "${BASE}/jp_idwr_db.duckdb"

R example (DuckDB, local)

This example opens the local jp_idwr_db.duckdb artifact (downloaded with the parquet files) and queries the unified view. Run it from the directory where jp_idwr_db.duckdb and the parquet files are located:

con <- DBI::dbConnect(duckdb::duckdb(), "jp_idwr_db.duckdb", read_only = TRUE)

tb <- DBI::dbGetQuery(
  con,
  "SELECT date, prefecture, disease, count, source
   FROM unified
   WHERE year = 2024 AND disease = 'Tuberculosis'
   ORDER BY date, prefecture
   LIMIT 20"
)

print(tb)
DBI::dbDisconnect(con, shutdown = TRUE)

        date prefecture      disease count             source
1 2024-01-01      Aichi Tuberculosis     5 All-case reporting
2 2024-01-01      Akita Tuberculosis     1 All-case reporting
3 2024-01-01     Aomori Tuberculosis     0 All-case reporting
4 2024-01-01      Chiba Tuberculosis     7 All-case reporting
5 2024-01-01      Ehime Tuberculosis     1 All-case reporting
6 2024-01-01      Fukui Tuberculosis     1 All-case reporting
...

R example (Arrow, remote)

You can also query the parquet files directly from the GitHub Release URL without downloading first:

library(magrittr)

url <- "https://github.com/AlFontal/jp-idwr-db/releases/latest/download/unified.parquet"

tb <- arrow::read_parquet(url) %>%
  dplyr::filter(year == 2024, disease == "Tuberculosis") %>%
  dplyr::select(date, prefecture, disease, count, source) %>%
  dplyr::arrange(date, prefecture)

print(as.data.frame(tb))

        date prefecture      disease count             source
1 2024-01-01      Aichi Tuberculosis     5 All-case reporting
2 2024-01-01      Akita Tuberculosis     1 All-case reporting
3 2024-01-01     Aomori Tuberculosis     0 All-case reporting
4 2024-01-01      Chiba Tuberculosis     7 All-case reporting
5 2024-01-01      Ehime Tuberculosis     1 All-case reporting
6 2024-01-01      Fukui Tuberculosis     1 All-case reporting
...

Main API

Top-level API exported by jp_idwr_db:

load(name)
get_data(...)
list_diseases(source="all")
list_prefectures()
get_latest_week()
prefecture_map()
attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
merge(...), pivot(...)
configure(...), get_config()

Datasets

Use jp.load(...) with:

"sex": historical sex-disaggregated surveillance
"place": historical place-category surveillance
"bullet": modern all-case weekly reports (rapid zensu)
"sentinel": sentinel reports (teitenrui; 2012+ in release data assets)
"unified": deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)

Note: teitenrui CSVs report year-to-date cumulative counts. jp-idwr-db converts these to weekly incidence (count_t - count_{t-1} within year/prefecture/disease; first week kept as-is).

Detailed schema and coverage are documented in DATASETS.md.

Raw Download and Parsing

Raw file workflows are available in jp_idwr_db.io:

jp_idwr_db.io.download(...)
jp_idwr_db.io.download_recent(...)
jp_idwr_db.io.read(...)

These are useful for refreshing local raw weekly files or debugging parser behavior.

Data Wrangling Examples

See EXAMPLES.md for data wrangling recipes (grouping, trends, regional slices, source-aware filtering).

Disease-by-disease temporal coverage is documented in DISEASES.md.

Data Source

NIID/JIHS infectious disease surveillance publications:

Historical annual archive files (Syu_01_1, Syu_02_1)
Rapid weekly CSV reports (zensuXX.csv, teitenruiXX.csv)

Development

uv sync --all-extras --dev
uv run ruff check .
uv run mypy src
uv run pytest

# Build release data assets (manifest + duckdb + parquet metadata)
  uv run --with duckdb --with jsonschema jp-idwr-db-build-assets \
  --data-dir data/parquet \
  --release-tag vYYYY.M.D \
  --base-url https://github.com/AlFontal/jp-idwr-db/releases/download/vYYYY.M.D \
  --schema-path docs/manifest.schema.json

Security and Integrity

Release assets include a manifest.json with SHA256 checksums and file sizes.
ensure_data() verifies each downloaded parquet checksum and size before marking cache complete.
For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.

License

GPL-3.0-or-later. See LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2026.5.13

May 13, 2026

2026.4.29

Apr 29, 2026

2026.4.15

Apr 15, 2026

2026.4.1

Apr 1, 2026

2026.3.26

Mar 26, 2026

0.2.6

Mar 26, 2026

0.2.5

Feb 7, 2026

0.2.4

Feb 7, 2026

0.2.3

Feb 6, 2026

0.2.2

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jp_idwr_db-2026.5.13.tar.gz (53.6 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

jp_idwr_db-2026.5.13-py3-none-any.whl (60.7 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file jp_idwr_db-2026.5.13.tar.gz.

File metadata

Download URL: jp_idwr_db-2026.5.13.tar.gz
Upload date: May 13, 2026
Size: 53.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-2026.5.13.tar.gz
Algorithm	Hash digest
SHA256	`dd04437fdf9aaf2da0178d323bfccb0a4074d5ab37a715a95831d68f3f2bf329`
MD5	`5b7b9b824d5e11ca4054e6be1f83be97`
BLAKE2b-256	`85303bea3a483dc2f5be5ba7f33041f6ba609f0c4293a3ef65c51dc4ae3573d9`

See more details on using hashes here.

File details

Details for the file jp_idwr_db-2026.5.13-py3-none-any.whl.

File metadata

Download URL: jp_idwr_db-2026.5.13-py3-none-any.whl
Upload date: May 13, 2026
Size: 60.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.14 {"installer":{"name":"uv","version":"0.11.14","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-2026.5.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a849040b767e459e68807c9ac503aa319245aa3ac183163a541dc2e49bdf0166`
MD5	`2036f9afc894e8f7c440573c5c8f34a1`
BLAKE2b-256	`601cb6441659b3c3da9ea23bd0f9be10eb6e69268475a71d3bb5e1b15b38183a`

See more details on using hashes here.

jp-idwr-db 2026.5.13

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

jp-idwr-db

Python Install

Quick Start

Language-independent data access

Download assets for any language

R example (DuckDB, local)

R example (Arrow, remote)

Main API

Datasets

Raw Download and Parsing

Data Wrangling Examples

Data Source

Development

Security and Integrity

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes