Rust-first medical imaging data loading and training utilities.
Project description
medkit-rs
Rust-first medical imaging data tooling for dataset validation, deterministic preprocessing caches, and training-time batch access.
Workspace
medkit-core: spatial image contracts, geometry, dtype, axes, metadata, and provenance.medkit-io: metadata readers for imaging formats such as NIfTI.medkit-dataset: dataset scanning, case pairing, validation, manifests, and reports.medkit-transform: deterministic preprocessing plans and kernels.medkit-cache: content-addressed preprocessing cache.medkit-sampler: foreground-balanced patch planning and extraction.medkit-cxr: CXR manifests, validation, patient-safe splits, and 2D cache creation.medkit-python: PyO3 extension for Rust-owned batch extraction.medkit-python-ffi: C-ABI bridge retained as a baseline.medkit-cli: command-line workflows exposed through themedkitbinary.
Core Workflows
Validate an nnU-Net-style NIfTI segmentation dataset:
cargo run -p medkit-cli -- dataset validate ./data \
--images imagesTr \
--labels labelsTr \
--layout nnunet \
--out manifest.json \
--report report.txt
Prepare a deterministic cache:
cargo run -p medkit-cli -- prepare ./data \
--manifest manifest.json \
--plan ct-segmentation.toml \
--cache .medkit/cache \
--chunk 96,96,96
Sample and benchmark training patches:
cargo run -p medkit-cli -- sample .medkit/cache \
--patch 96,96,96 \
--strategy foreground-balanced \
--count 10000 \
--seed 123 \
--epoch 0 \
--worker 0 \
--out patches.jsonl
cargo run -p medkit-cli -- bench-plan .medkit/cache \
--patches patches.jsonl \
--workers 8 \
--samples 10000
Prepare a CXR cache for the Python drop-in loader:
cargo run -p medkit-cli -- cxr manifest --images data/cxr/files --metadata metadata.csv --labels labels.csv --out data/cxr/manifest.jsonl
cargo run -p medkit-cli -- cxr validate data/cxr/manifest.jsonl --require-frontal --check-patient-leakage --check-duplicates --report data/cxr/validation.md
cargo run -p medkit-cli -- cxr split data/cxr/manifest.jsonl --by patient_id --train 0.8 --val 0.1 --test 0.1 --seed 0 --out data/cxr/splits.json
cargo run -p medkit-cli -- cxr cache data/cxr/manifest.jsonl --splits data/cxr/splits.json --plan recipes/cxr-512.toml --cache data/cxr/.medkit/cache
cargo run -p medkit-cli -- cxr validate-cache data/cxr/.medkit/cache --split train --plan recipes/cxr-512.toml --report data/cxr/cache-validation.md
uv run --with torch examples/cxr_dropin_pytorch_train.py --cache-dir data/cxr/.medkit/cache --batch-size 32
CXR H100 Benchmark Recipe
For current-source Modal benchmark runs, use the local package build until the published PyPI package includes the latest CXR prefetch arguments:
MEDKIT_MODAL_USE_PYPI=0 python crates/medkit-benchmarks/scripts/modal_cxr_parallel_matrix.py \
--batch-id cxr-confirm-h100-pinned-d2-rw4-local \
--baselines pytorch_raw,medkit_native_prefetch_pinned \
--cache-dtypes float32,uint8 \
--read-modes stream \
--image-size 512 \
--batch-size 32 \
--workers 8 \
--max-samples 6000 \
--max-train 4096 \
--max-val 1024 \
--max-test 1024 \
--epochs 1 \
--loader-batches 64 \
--warmup-batches 4 \
--profile-batches 128 \
--drop-last-train \
--prefetch-depth 2 \
--prefetch-read-workers 4 \
--no-include-metadata \
--max-eval-batches 1 \
--modal-gpu H100
The current fastest confirmed CXR path is native prefetch with pinned batches,
stream reads, prefetch_depth=2, and prefetch_read_workers=4. In the May 20,
2026 H100 confirmation run on the public NIH ChestX-ray14 cache, raw PyTorch
reached 194.7 train samples/s, while medkit pinned stream reached 379.8
samples/s with float32 cache data and 377.4 samples/s with uint8 cache data.
Both medkit rows used about 64 MB of estimated pinned batch memory and reported
near-zero cache-image PSS.
DICOM Decoder Policy
The default DICOM pixel backend is medkit-native. It keeps normal builds
small and covers the initial CXR-focused support matrix: uncompressed little and
big endian, RLE Lossless, and JPEG Baseline.
An opt-in DICOM-rs backend is available for broader pure-Rust codec coverage:
cargo test -p medkit-dicom --features dicom-rs-codecs
cargo run -p medkit-cli --features dicom-rs-codecs -- dicom pixels --explain image.dcm --decoder-backend dicom-rs
cargo run -p medkit-cli --features dicom-rs-codecs -- cxr cache manifest.jsonl --splits splits.json --plan recipes/cxr-512.toml --cache .medkit/cxr-cache --dicom-decoder-backend auto
Native codec stacks for JPEG-LS or JPEG 2000 are intentionally not enabled in the default package until real fixtures and packaging tradeoffs are verified.
Python Surface
The CXR drop-in API exposes PyTorch-style dataset and loader helpers:
import medkit_rs as medkit
train_ds = medkit.cxr.Dataset("data/cxr/.medkit/cache", split="train")
train_loader = medkit.cxr.DataLoader(
train_ds,
batch_size=32,
shuffle=True,
prefetch=True,
)
Batches use stable keys: image, labels, mask, and metadata sidecars such
as sample_id, patient_id, study_id, and image_id.
Development
Create the development environment and build the native Python extension:
uv sync --dev
uv run maturin develop --release
Run the full test suite:
cargo fmt --all --check
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo test --workspace --locked --exclude medkit-python
uv run python scripts/run_medkit_python_rust_tests.py
uv run python scripts/check_python_api.py
uv run python -m compileall python tests scripts examples crates/medkit-benchmarks/scripts
uv run pytest tests/python -q
Internal planning, benchmark notes, and generated reports are intentionally ignored by git.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file medkit_rs-0.1.1.tar.gz.
File metadata
- Download URL: medkit_rs-0.1.1.tar.gz
- Upload date:
- Size: 185.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e27c63ca61cb8837351e4e3d23864d4430b33d609c6f2e3fd1a1828d5b861b77
|
|
| MD5 |
1b6bf643c174661b2c54c85e7fbe576f
|
|
| BLAKE2b-256 |
4ce92a5df7ab8c1ffc9bf4296d4f69ebaca0102746d66d2633ec3c602c6dbe4a
|
Provenance
The following attestation bundles were made for medkit_rs-0.1.1.tar.gz:
Publisher:
release.yml on ainergiz/medkit-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
medkit_rs-0.1.1.tar.gz -
Subject digest:
e27c63ca61cb8837351e4e3d23864d4430b33d609c6f2e3fd1a1828d5b861b77 - Sigstore transparency entry: 1582833471
- Sigstore integration time:
-
Permalink:
ainergiz/medkit-rs@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ainergiz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Trigger Event:
push
-
Statement type:
File details
Details for the file medkit_rs-0.1.1-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: medkit_rs-0.1.1-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 526.1 kB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9a818f81df3103ed9c308a14c0642e306cfd9d4c35b010ac0c0f41cfc079fbc
|
|
| MD5 |
0835b9b81e5ebdd46bf4468ee77f91d1
|
|
| BLAKE2b-256 |
dc1ae39be11c20a2ca609cd12060bb1252065d22bd17579eea4e80533bffc46f
|
Provenance
The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-win_amd64.whl:
Publisher:
release.yml on ainergiz/medkit-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
medkit_rs-0.1.1-cp311-abi3-win_amd64.whl -
Subject digest:
c9a818f81df3103ed9c308a14c0642e306cfd9d4c35b010ac0c0f41cfc079fbc - Sigstore transparency entry: 1582834304
- Sigstore integration time:
-
Permalink:
ainergiz/medkit-rs@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ainergiz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Trigger Event:
push
-
Statement type:
File details
Details for the file medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 762.0 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d2eb13e1f1807fba24da12513ab349b98c78208989dc5630013a0b8e8f507228
|
|
| MD5 |
a7c0d0a32b682fd409a89fd526c83244
|
|
| BLAKE2b-256 |
e78ceefb55ad6917f5d39b1b6d6bd5eecc227bc9bfd655721aa5d31c4e65adbf
|
Provenance
The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on ainergiz/medkit-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
medkit_rs-0.1.1-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
d2eb13e1f1807fba24da12513ab349b98c78208989dc5630013a0b8e8f507228 - Sigstore transparency entry: 1582834387
- Sigstore integration time:
-
Permalink:
ainergiz/medkit-rs@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ainergiz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Trigger Event:
push
-
Statement type:
File details
Details for the file medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 616.7 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
064bc4a3624387f1a71c0bb9f363b7f819a528159acc7f6f3a7c34d49664db93
|
|
| MD5 |
e1f62c80d6042afdd6a27e06931d283e
|
|
| BLAKE2b-256 |
98de9f8ad59fc6aa8b6428841875b5a176de388cdc26304b9c6cd334d45c4f89
|
Provenance
The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on ainergiz/medkit-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
medkit_rs-0.1.1-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
064bc4a3624387f1a71c0bb9f363b7f819a528159acc7f6f3a7c34d49664db93 - Sigstore transparency entry: 1582834933
- Sigstore integration time:
-
Permalink:
ainergiz/medkit-rs@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ainergiz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Trigger Event:
push
-
Statement type:
File details
Details for the file medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 632.4 kB
- Tags: CPython 3.11+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7799bade8c0a5ad4970b6ae9a77d98aed5e0e089df5fae5a044a3d1c679d8973
|
|
| MD5 |
4eaf11d4456a6a0cc7bbb7ad730be5e7
|
|
| BLAKE2b-256 |
d6b39ce1b7a940cbe72b7599412608dd92321cd2e93695eb271a6724dd9bfdc6
|
Provenance
The following attestation bundles were made for medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on ainergiz/medkit-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
medkit_rs-0.1.1-cp311-abi3-macosx_10_12_x86_64.whl -
Subject digest:
7799bade8c0a5ad4970b6ae9a77d98aed5e0e089df5fae5a044a3d1c679d8973 - Sigstore transparency entry: 1582833540
- Sigstore integration time:
-
Permalink:
ainergiz/medkit-rs@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ainergiz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@0adc9f5288bfa6f87b9f13f8a97b5e59a7724f8b -
Trigger Event:
push
-
Statement type: