Skip to main content

Rust-first medical imaging data loading and training utilities.

Project description

medkit-rs

Rust-first medical imaging data tooling for dataset validation, deterministic preprocessing caches, and training-time batch access.

Workspace

  • medkit-core: spatial image contracts, geometry, dtype, axes, metadata, and provenance.
  • medkit-io: metadata readers for imaging formats such as NIfTI.
  • medkit-dataset: dataset scanning, case pairing, validation, manifests, and reports.
  • medkit-transform: deterministic preprocessing plans and kernels.
  • medkit-cache: content-addressed preprocessing cache.
  • medkit-sampler: foreground-balanced patch planning and extraction.
  • medkit-cxr: CXR manifests, validation, patient-safe splits, and 2D cache creation.
  • medkit-python: PyO3 extension for Rust-owned batch extraction.
  • medkit-python-ffi: C-ABI bridge retained as a baseline.
  • medkit-cli: command-line workflows exposed through the medkit binary.

Core Workflows

Validate an nnU-Net-style NIfTI segmentation dataset:

cargo run -p medkit-cli -- dataset validate ./data \
  --images imagesTr \
  --labels labelsTr \
  --layout nnunet \
  --out manifest.json \
  --report report.txt

Prepare a deterministic cache:

cargo run -p medkit-cli -- prepare ./data \
  --manifest manifest.json \
  --plan ct-segmentation.toml \
  --cache .medkit/cache \
  --chunk 96,96,96

Sample and benchmark training patches:

cargo run -p medkit-cli -- sample .medkit/cache \
  --patch 96,96,96 \
  --strategy foreground-balanced \
  --count 10000 \
  --seed 123 \
  --epoch 0 \
  --worker 0 \
  --out patches.jsonl

cargo run -p medkit-cli -- bench-plan .medkit/cache \
  --patches patches.jsonl \
  --workers 8 \
  --samples 10000

Prepare a CXR cache for the Python drop-in loader:

cargo run -p medkit-cli -- cxr manifest --images data/cxr/files --metadata metadata.csv --labels labels.csv --out data/cxr/manifest.jsonl
cargo run -p medkit-cli -- cxr validate data/cxr/manifest.jsonl --require-frontal --check-patient-leakage --check-duplicates --report data/cxr/validation.md
cargo run -p medkit-cli -- cxr split data/cxr/manifest.jsonl --by patient_id --train 0.8 --val 0.1 --test 0.1 --seed 0 --out data/cxr/splits.json
cargo run -p medkit-cli -- cxr cache data/cxr/manifest.jsonl --splits data/cxr/splits.json --plan recipes/cxr-512.toml --cache data/cxr/.medkit/cache
cargo run -p medkit-cli -- cxr validate-cache data/cxr/.medkit/cache --split train --plan recipes/cxr-512.toml --report data/cxr/cache-validation.md
uv run --with torch examples/cxr_dropin_pytorch_train.py --cache-dir data/cxr/.medkit/cache --batch-size 32

DICOM Decoder Policy

The default DICOM pixel backend is medkit-native. It keeps normal builds small and covers the initial CXR-focused support matrix: uncompressed little and big endian, RLE Lossless, and JPEG Baseline.

An opt-in DICOM-rs backend is available for broader pure-Rust codec coverage:

cargo test -p medkit-dicom --features dicom-rs-codecs
cargo run -p medkit-cli --features dicom-rs-codecs -- dicom pixels --explain image.dcm --decoder-backend dicom-rs
cargo run -p medkit-cli --features dicom-rs-codecs -- cxr cache manifest.jsonl --splits splits.json --plan recipes/cxr-512.toml --cache .medkit/cxr-cache --dicom-decoder-backend auto

Native codec stacks for JPEG-LS or JPEG 2000 are intentionally not enabled in the default package until real fixtures and packaging tradeoffs are verified.

Python Surface

The CXR drop-in API exposes PyTorch-style dataset and loader helpers:

import medkit_rs as medkit

train_ds = medkit.cxr.Dataset("data/cxr/.medkit/cache", split="train")
train_loader = medkit.cxr.DataLoader(
    train_ds,
    batch_size=32,
    shuffle=True,
    pin_memory=True,
    prefetch=True,
)

Batches use stable keys: image, labels, mask, and metadata sidecars such as sample_id, patient_id, study_id, and image_id.

Development

Create the development environment and build the native Python extension:

uv sync --dev
uv run maturin develop --release

Run the full test suite:

cargo fmt --all --check
cargo clippy --workspace --all-targets --locked -- -D warnings
cargo test --workspace --locked --exclude medkit-python
uv run python scripts/run_medkit_python_rust_tests.py
uv run python scripts/check_python_api.py
uv run python -m compileall python tests scripts examples crates/medkit-benchmarks/scripts
uv run pytest tests/python -q

Internal planning, benchmark notes, and generated reports are intentionally ignored by git.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medkit_rs-0.1.0.tar.gz (181.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

medkit_rs-0.1.0-cp311-abi3-win_amd64.whl (521.4 kB view details)

Uploaded CPython 3.11+Windows x86-64

medkit_rs-0.1.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (754.9 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

medkit_rs-0.1.0-cp311-abi3-macosx_11_0_arm64.whl (608.7 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

medkit_rs-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl (624.3 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file medkit_rs-0.1.0.tar.gz.

File metadata

  • Download URL: medkit_rs-0.1.0.tar.gz
  • Upload date:
  • Size: 181.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medkit_rs-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76aed369f3185e4968a53b96abaa36cd797a7704d27428e95a4d7f58354c3798
MD5 18e730bf876d0fb0dea322d1693bfcbe
BLAKE2b-256 a3c5e497b2b2b8642561fdff0ec1a5a18a4af43254a9c84b8f2dfb260acd5ace

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.0.tar.gz:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: medkit_rs-0.1.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 521.4 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for medkit_rs-0.1.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 026e2d4c372a4dd90065d2dd87a4cefad786b088abdb24ffca039d293639fa5a
MD5 1ff4f573aa69dc853d64f0a21f955687
BLAKE2b-256 30c96f620e905fe623fe172e594427bebaca7e129466ee32aa855a1e9b485ef1

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.0-cp311-abi3-win_amd64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 49051787d8fdf515203b97c771551d4384f2f6aeae5a6a9e0f0f4df49e7f4c8c
MD5 faf24a0ede4ce16afa13ab3d646ead2b
BLAKE2b-256 4b31737406e50539e363a73e4eb6df12b257440d9c843b7c86e2e88cfaabde79

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 25838d2b3627f6b6c4699625e56154bab2f3792ca2336469ea3142450e3b2568
MD5 0df42b1b0b9d32c0a18526cbcf2c05c5
BLAKE2b-256 1315878b0db4a9e11d06f81864470a862f38a286b0c4ce5952c87eaadb4e064f

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file medkit_rs-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for medkit_rs-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 e7a8f971112c48bb4ad885477fefc1f20fff8f24341b4a7eea576d95457f524b
MD5 5de1ff3557ce998b09045ce261a299cd
BLAKE2b-256 b182039e77715ee9e69974e82c8d6757e3b256c4e914f409e882721675ddf0ff

See more details on using hashes here.

Provenance

The following attestation bundles were made for medkit_rs-0.1.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: release.yml on ainergiz/medkit-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page