Skip to main content

Rust-first medical imaging data loading and training utilities.

Project description

medkit-rs

Rust-first medical imaging data tooling for dataset validation, deterministic preprocessing caches, and training-time batch access.

Workspace

  • medkit-core: spatial image contracts, geometry, dtype, axes, metadata, and provenance.
  • medkit-io: metadata readers for imaging formats such as NIfTI.
  • medkit-dataset: dataset scanning, case pairing, validation, manifests, and reports.
  • medkit-transform: deterministic preprocessing plans and kernels.
  • medkit-cache: content-addressed preprocessing cache.
  • medkit-sampler: foreground-balanced patch planning and extraction.
  • medkit-cxr: CXR manifests, validation, patient-safe splits, and 2D cache creation.
  • medkit-python: PyO3 extension for Rust-owned batch extraction.
  • medkit-python-ffi: C-ABI bridge retained as a baseline.
  • medkit-cli: command-line workflows exposed through the medkit binary.

Core Workflows

Validate an nnU-Net-style NIfTI segmentation dataset:

cargo run -p medkit-cli -- dataset validate ./data \
  --images imagesTr \
  --labels labelsTr \
  --layout nnunet \
  --out manifest.json \
  --report report.txt

Prepare a deterministic cache:

cargo run -p medkit-cli -- prepare ./data \
  --manifest manifest.json \
  --plan ct-segmentation.toml \
  --cache .medkit/cache \
  --chunk 96,96,96

Sample and benchmark training patches:

cargo run -p medkit-cli -- sample .medkit/cache \
  --patch 96,96,96 \
  --strategy foreground-balanced \
  --count 10000 \
  --seed 123 \
  --epoch 0 \
  --worker 0 \
  --out patches.jsonl

cargo run -p medkit-cli -- bench-plan .medkit/cache \
  --patches patches.jsonl \
  --workers 8 \
  --samples 10000

Prepare a CXR cache for the Python drop-in loader:

cargo run -p medkit-cli -- cxr manifest --images data/cxr/files --metadata metadata.csv --labels labels.csv --out data/cxr/manifest.jsonl
cargo run -p medkit-cli -- cxr validate data/cxr/manifest.jsonl --require-frontal --check-patient-leakage --check-duplicates --report data/cxr/validation.md
cargo run -p medkit-cli -- cxr split data/cxr/manifest.jsonl --by patient_id --train 0.8 --val 0.1 --test 0.1 --seed 0 --out data/cxr/splits.json
cargo run -p medkit-cli -- cxr cache data/cxr/manifest.jsonl --splits data/cxr/splits.json --plan recipes/cxr-512.toml --cache data/cxr/.medkit/cache
cargo run -p medkit-cli -- cxr validate-cache data/cxr/.medkit/cache --split train --plan recipes/cxr-512.toml --report data/cxr/cache-validation.md
uv run --with torch examples/cxr_dropin_pytorch_train.py --cache-dir data/cxr/.medkit/cache --batch-size 32

CXR H100 Benchmark Recipe

For current-source Modal benchmark runs, use the local package build until the published PyPI package includes the latest CXR prefetch arguments:

MEDKIT_MODAL_USE_PYPI=0 python crates/medkit-benchmarks/scripts/modal_cxr_parallel_matrix.py \
  --batch-id cxr-confirm-h100-pinned-d2-rw4-local \
  --baselines pytorch_raw,medkit_native_prefetch_pinned \
  --cache-dtypes float32,uint8 \
  --read-modes stream \
  --image-size 512 \
  --batch-size 32 \
  --workers 8 \
  --max-samples 6000 \
  --max-train 4096 \
  --max-val 1024 \
  --max-test 1024 \
  --epochs 1 \
  --loader-batches 64 \
  --warmup-batches 4 \
  --profile-batches 128 \
  --drop-last-train \
  --prefetch-depth 2 \
  --prefetch-read-workers 4 \
  --no-include-metadata \
  --max-eval-batches 1 \
  --modal-gpu H100

The current fastest confirmed CXR path is native prefetch with pinned batches, stream reads, prefetch_depth=2, and prefetch_read_workers=4. In the May 20, 2026 H100 confirmation run on the public NIH ChestX-ray14 cache, raw PyTorch reached 194.7 train samples/s, while medkit pinned stream reached 379.8 samples/s with float32 cache data and 377.4 samples/s with uint8 cache data. Both medkit rows used about 64 MB of estimated pinned batch memory and reported near-zero cache-image PSS.

DICOM Decoder Policy

The default DICOM pixel backend is medkit-native. It keeps normal builds small and covers the initial CXR-focused support matrix: uncompressed little and big endian, RLE Lossless, and JPEG Baseline.

An opt-in DICOM-rs backend is available for broader pure-Rust codec coverage:

cargo test -p medkit-dicom --features dicom-rs-codecs
cargo run -p medkit-cli --features dicom-rs-codecs -- dicom pixels --explain image.dcm --decoder-backend dicom-rs
cargo run -p medkit-cli --features dicom-rs-codecs -- cxr cache manifest.jsonl --splits splits.json --plan recipes/cxr-512.toml --cache .medkit/cxr-cache --dicom-decoder-backend auto

Native codec stacks for JPEG-LS or JPEG 2000 are intentionally not enabled in the default package until real fixtures and packaging tradeoffs are verified.

Python Surface

The CXR drop-in API exposes PyTorch-style dataset and loader helpers:

import medkit_rs as medkit

train_ds = medkit.cxr.Dataset("data/cxr/.medkit/cache", split="train")
train_loader = medkit.cxr.DataLoader(
    train_ds,
    batch_size=32,
    shuffle=True,
    prefetch=True,
)

Batches use stable keys: image, labels, mask, and metadata sidecars such as sample_id, patient_id, study_id, and image_id.

Development

Create the development environment and build the native Python extension:

uv sync --dev
uv run maturin develop --release

Run the full test suite:

cargo fmt --all --check
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo test --workspace --locked --exclude medkit-python
uv run python scripts/run_medkit_python_rust_tests.py
uv run python scripts/check_python_api.py
uv run python -m compileall python tests scripts examples crates/medkit-benchmarks/scripts
uv run pytest tests/python -q

Internal planning, benchmark notes, and generated reports are intentionally ignored by git.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medkit_rs-0.1.1.tar.gz (185.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

medkit_rs-0.1.1-cp311-abi3-win_amd64.whl (526.1 kB view details)

Uploaded CPython 3.11+Windows x86-64

medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (762.0 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl (616.7 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl (632.4 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file medkit_rs-0.1.1.tar.gz.

File metadata

  • Download URL: medkit_rs-0.1.1.tar.gz
  • Upload date:
  • Size: 185.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medkit_rs-0.1.1.tar.gz
Algorithm Hash digest
SHA256 e27c63ca61cb8837351e4e3d23864d4430b33d609c6f2e3fd1a1828d5b861b77
MD5 1b6bf643c174661b2c54c85e7fbe576f
BLAKE2b-256 4ce92a5df7ab8c1ffc9bf4296d4f69ebaca0102746d66d2633ec3c602c6dbe4a

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.1.tar.gz:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.1-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: medkit_rs-0.1.1-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 526.1 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medkit_rs-0.1.1-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 c9a818f81df3103ed9c308a14c0642e306cfd9d4c35b010ac0c0f41cfc079fbc
MD5 0835b9b81e5ebdd46bf4468ee77f91d1
BLAKE2b-256 dc1ae39be11c20a2ca609cd12060bb1252065d22bd17579eea4e80533bffc46f

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-win_amd64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d2eb13e1f1807fba24da12513ab349b98c78208989dc5630013a0b8e8f507228
MD5 a7c0d0a32b682fd409a89fd526c83244
BLAKE2b-256 e78ceefb55ad6917f5d39b1b6d6bd5eecc227bc9bfd655721aa5d31c4e65adbf

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 064bc4a3624387f1a71c0bb9f363b7f819a528159acc7f6f3a7c34d49664db93
MD5 e1f62c80d6042afdd6a27e06931d283e
BLAKE2b-256 98de9f8ad59fc6aa8b6428841875b5a176de388cdc26304b9c6cd334d45c4f89

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 7799bade8c0a5ad4970b6ae9a77d98aed5e0e089df5fae5a044a3d1c679d8973
MD5 4eaf11d4456a6a0cc7bbb7ad730be5e7
BLAKE2b-256 d6b39ce1b7a940cbe72b7599412608dd92321cd2e93695eb271a6724dd9bfdc6

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page