Skip to main content

Japanese IDWR infectious disease database and analytics toolkit built on Polars.

Project description

jp-idwr-db

Python access to Japanese infectious disease surveillance data from NIID/JIHS.

jp-idwr-db provides a Polars-first API for filtering and analysis. Parquet datasets are versioned as GitHub Release assets and downloaded to a local cache on first use. It is inspired by the R package jpinfect, but it is not an API-parity port and includes independently curated ingestion and coverage.

Install

pip install jp-idwr-db

Data Download Model

  • Package wheels do not ship the large parquet tables.
  • On first call to jp.load(...) (or jp.get_data(...)), the package downloads versioned data assets from GitHub Releases.
  • Cache path defaults to:
    • macOS: ~/Library/Caches/jp_idwr_db/data/<version>/
    • Linux: ~/.cache/jp_idwr_db/data/<version>/
    • Windows: %LOCALAPPDATA%\\jp_idwr_db\\Cache\\data\\<version>\\

Prefetch explicitly:

python -m jp_idwr_db data download
python -m jp_idwr_db data download --version v0.1.0 --force

Environment overrides:

  • JPINFECT_DATA_VERSION: choose a specific release tag (example: v0.1.0)
  • JPINFECT_DATA_BASE_URL: override asset host base URL
  • JPINFECT_CACHE_DIR: override local cache root

Quick Start

import jp_idwr_db as jp

# Full unified dataset (recommended)
df = jp.load("unified")
print(df.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
│ prefecture ┆ disease                         ┆ year ┆ week ┆ count ┆ source                │
│ ---        ┆ ---                             ┆ ---  ┆ ---  ┆ ---   ┆ ---                   │
│ str        ┆ str                             ┆ i32  ┆ i32  ┆ f64   ┆ str                   │
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
│ Tochigi    ┆ Lyme disease                    ┆ 2011 ┆ 24   ┆ 0.0   ┆ Confirmed cases       │
│ Kochi      ┆ Avian influenza H5N1            ┆ 2008 ┆ 51   ┆ 0.0   ┆ Confirmed cases       │
│ Hokkaido   ┆ Dengue fever                    ┆ 1999 ┆ 28   ┆ 0.0   ┆ Confirmed cases       │
│ Tokyo      ┆ Congenital rubella syndrome     ┆ 2014 ┆ 41   ┆ 0.0   ┆ Confirmed cases       │
│ Nagasaki   ┆ Severe Acute Respiratory Syndr… ┆ 2018 ┆ 4    ┆ 0.0   ┆ Confirmed cases       │
│ Fukushima  ┆ Infectious gastroenteritis (on… ┆ 2019 ┆ 25   ┆ 145.0 ┆ Sentinel surveillance │
│ Nara       ┆ Severe invasive streptococcal … ┆ 2003 ┆ 10   ┆ 0.0   ┆ Confirmed cases       │
│ Mie        ┆ Plague                          ┆ 2006 ┆ 37   ┆ 0.0   ┆ Confirmed cases       │
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘
import jp_idwr_db as jp

# Optional: attach ISO prefecture IDs (JP-01 ... JP-47) only when needed
df_with_ids = jp.attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
print(df_with_ids.select(["prefecture", "prefecture_id"]).head())
shape: (5, 2)
┌────────────┬───────────────┐
│ prefecture ┆ prefecture_id │
╞════════════╪═══════════════╡
│ Tochigi    ┆ JP-09         │
│ Kochi      ┆ JP-39         │
│ Hokkaido   ┆ JP-01         │
│ Tokyo      ┆ JP-13         │
│ Nagasaki   ┆ JP-42         │
└────────────┴───────────────┘

Main API

Top-level API exported by jp_idwr_db:

  • load(name)
  • get_data(...)
  • list_diseases(source="all")
  • list_prefectures()
  • get_latest_week()
  • prefecture_map()
  • attach_prefecture_id(df, prefecture_col="prefecture", id_col="prefecture_id")
  • merge(...), pivot(...)
  • configure(...), get_config()

Filtered Access with get_data

import jp_idwr_db as jp

# Tuberculosis rows for a year range
tb = jp.get_data(disease="Tuberculosis", year=(2018, 2023))
print(tb.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬──────────────┬──────┬──────┬───────┬─────────────────┐
│ prefecture ┆ disease      ┆ year ┆ week ┆ count ┆ source          │
│ ---        ┆ ---          ┆ ---  ┆ ---  ┆ ---   ┆ ---             │
│ str        ┆ str          ┆ i32  ┆ i32  ┆ f64   ┆ str             │
╞════════════╪══════════════╪══════╪══════╪═══════╪═════════════════╡
│ Hokkaido   ┆ Tuberculosis ┆ 2020 ┆ 12   ┆ 5.0   ┆ Confirmed cases │
│ Oita       ┆ Tuberculosis ┆ 2023 ┆ 38   ┆ 6.0   ┆ Confirmed cases │
│ Fukuoka    ┆ Tuberculosis ┆ 2021 ┆ 8    ┆ 12.0  ┆ Confirmed cases │
│ Kagawa     ┆ Tuberculosis ┆ 2020 ┆ 19   ┆ 2.0   ┆ Confirmed cases │
│ Chiba      ┆ Tuberculosis ┆ 2020 ┆ 19   ┆ 9.0   ┆ Confirmed cases │
│ Kanagawa   ┆ Tuberculosis ┆ 2022 ┆ 17   ┆ 25.0  ┆ Confirmed cases │
│ Okinawa    ┆ Tuberculosis ┆ 2021 ┆ 11   ┆ 4.0   ┆ Confirmed cases │
│ Gifu       ┆ Tuberculosis ┆ 2018 ┆ 23   ┆ 7.0   ┆ Confirmed cases │
└────────────┴──────────────┴──────┴──────┴───────┴─────────────────┘
import jp_idwr_db as jp

# Sentinel-only diseases from recent years
sentinel = jp.get_data(source="sentinel", year=(2023, 2026))
print(sentinel.select(["prefecture", "disease", "year", "week", "count", "source"]).head(8))
shape: (8, 6)
┌────────────┬─────────────────────────────────┬──────┬──────┬───────┬───────────────────────┐
│ prefecture ┆ disease                         ┆ year ┆ week ┆ count ┆ source                │
│ ---        ┆ ---                             ┆ ---  ┆ ---  ┆ ---   ┆ ---                   │
│ str        ┆ str                             ┆ i32  ┆ i32  ┆ f64   ┆ str                   │
╞════════════╪═════════════════════════════════╪══════╪══════╪═══════╪═══════════════════════╡
│ Ishikawa   ┆ Respiratory syncytial virus in… ┆ 2024 ┆ 42   ┆ 813.0 ┆ Sentinel surveillance │
│ Nara       ┆ Erythema infection              ┆ 2025 ┆ 31   ┆ 823.0 ┆ Sentinel surveillance │
│ Saga       ┆ Mumps                           ┆ 2024 ┆ 26   ┆ 14.0  ┆ Sentinel surveillance │
│ Hyogo      ┆ Pharyngoconjunctival fever      ┆ 2023 ┆ 19   ┆ 468.0 ┆ Sentinel surveillance │
│ Miyazaki   ┆ Infectious gastroenteritis      ┆ 2026 ┆ 3    ┆ 339.0 ┆ Sentinel surveillance │
│ Kagoshima  ┆ Infectious gastroenteritis (on… ┆ 2024 ┆ 9    ┆ null  ┆ Sentinel surveillance │
│ Osaka      ┆ Mumps                           ┆ 2024 ┆ 49   ┆ 404.0 ┆ Sentinel surveillance │
│ Aomori     ┆ Erythema infection              ┆ 2024 ┆ 10   ┆ 5.0   ┆ Sentinel surveillance │
└────────────┴─────────────────────────────────┴──────┴──────┴───────┴───────────────────────┘

Datasets

Use jp.load(...) with:

  • "sex": historical sex-disaggregated surveillance
  • "place": historical place-category surveillance
  • "bullet": modern all-case weekly reports (rapid zensu)
  • "sentinel": sentinel weekly reports (teitenrui; 2012+ in release data assets)
  • "unified": deduplicated combined dataset (sex-total + modern bullet/sentinel, recommended)

Detailed schema and coverage are documented in DATASETS.md.

Raw Download and Parsing

Raw file workflows are available in jp_idwr_db.io:

  • jp_idwr_db.io.download(...)
  • jp_idwr_db.io.download_recent(...)
  • jp_idwr_db.io.read(...)

These are useful for refreshing local raw weekly files or debugging parser behavior.

Data Wrangling Examples

See EXAMPLES.md for Polars-first data wrangling recipes (grouping, trends, regional slices, source-aware filtering).

Disease-by-disease temporal coverage is documented in DISEASES.md.

Data Source

NIID/JIHS infectious disease surveillance publications:

  • Historical annual archive files (Syu_01_1, Syu_02_1)
  • Rapid weekly CSV reports (zensuXX.csv, teitenruiXX.csv)

Development

uv sync --all-extras --dev
uv run ruff check .
uv run mypy src
uv run pytest

Security and Integrity

  • Release assets include a jp_idwr_db-manifest.json with SHA256 checksums.
  • ensure_data() verifies archive checksum and each extracted parquet checksum before marking cache complete.
  • For PyPI publishing, prefer Trusted Publishing (OIDC) over long-lived API tokens.

License

GPL-3.0-or-later. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jp_idwr_db-0.2.2.tar.gz (41.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jp_idwr_db-0.2.2-py3-none-any.whl (46.6 kB view details)

Uploaded Python 3

File details

Details for the file jp_idwr_db-0.2.2.tar.gz.

File metadata

  • Download URL: jp_idwr_db-0.2.2.tar.gz
  • Upload date:
  • Size: 41.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-0.2.2.tar.gz
Algorithm Hash digest
SHA256 9a7bbab443569cd2ebf7f229d8e304936bf61a12302d127187d9fe8b19a6ccb3
MD5 1cd07371297d71dc65b3b98ee9a54d18
BLAKE2b-256 554647c417a3cb31af37592e99dffb4e5531cedcfda9fab5cb36be0345c6e9b7

See more details on using hashes here.

File details

Details for the file jp_idwr_db-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: jp_idwr_db-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 46.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for jp_idwr_db-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f20faf7be321f0587eeea1cc1a674113aa035e02d8fb951d6401ee75c6f6a8d2
MD5 f96e363d7641d6cbd7449c4893c219b4
BLAKE2b-256 338729f8fcca58c9b8b1df674906fc4f4d7d32215662ec6a440010e843f441b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page