Dataset validation and preprocessing toolkit for neurology brain imaging (NIfTI)
Project description
NeuroTK: Dataset Validation for Neurology Brain Imaging
Motivation
Neurology brain imaging datasets are heterogeneous and frequently contain inconsistencies. Geometry, spacing, orientation, and annotation issues occur commonly across CT and MRI collections. These problems often surface late in modeling, when remediation is costly and compromises reproducibility. NeuroTK surfaces issues early, explicitly, and reproducibly to support dataset hygiene prior to analysis.
Scope
NeuroTK focuses on dataset quality assurance prior to downstream analysis. It provides dataset-level and file-level validation with structural and geometric consistency checks, and assessment of annotation presence and integrity.
- Dataset-level and file-level validation
- Structural and geometric consistency checks
- Annotation presence and integrity assessment
NeuroTK does not modify scientific data.
Installation
pip install neurotk
Quickstart
neurotk validate --images imagesTr --labels labelsTr --out report.json
For validate, NeuroTK scans directories recursively for .nii/.nii.gz files. Filenames must match exactly for image-label pairing.
dataset/
imagesTr/
case_001.nii.gz
case_002.nii.gz
labelsTr/
case_001.nii.gz
case_002.nii.gz
CLI Reference
Validate:
neurotk validate \
--images imagesTr \
--labels labelsTr \
--out report.json \
--max-samples 10 \
--html report.html \
--summary-only
Key options:
--images(required): directory of input NIfTI images.--labels(optional): directory of label NIfTI files.--out(required): output JSON report path.--max-samples(optional): limit number of images processed.--html(optional): write HTML report.--summary-only(optional): print text summary to stdout.
Preprocess:
neurotk preprocess \
--images imagesTr \
--labels labelsTr \
--out preprocessed/ \
--spacing 1.0 1.0 1.0 \
--orientation RAS \
--copy-metadata
Key options:
--images(required): directory of input NIfTI images.--labels(optional): directory of label NIfTI files.--out(required): output directory for preprocessed files.--spacing(required): target spacing as 3 floats.--orientation(optional): target orientation (defaultRAS).--dry-run(optional): preview preprocessing without writing outputs.--copy-metadata(optional): preserve metadata when applicable.
Inference (MONAI bundles)
NeuroTK can run inference from external MONAI bundles via the optional inference extras:
pip install neurotk[inference]
Single image:
neurotk infer \
--bundle-dir /path/to/bundle \
--input image.nii.gz \
--output-dir outputs/
Default bundle (uses NEUROTK_DEFAULT_BUNDLE or UMNSHAMLAB/segresnet):
neurotk infer \
--input image.nii.gz \
--output-dir outputs/
Default HF bundle repo: UMNSHAMLAB/segresnet.
From Hugging Face (auto-download + cache full bundle):
neurotk infer \
--bundle-dir hf:UMNSHAMLAB/segresnet \
--input image.nii.gz \
--output-dir outputs/
You can also pass a Hugging Face repo URL:
neurotk infer \
--bundle-dir https://huggingface.co/UMNSHAMLAB/segresnet \
--input image.nii.gz \
--output-dir outputs/
Batch mode:
neurotk infer \
--bundle-dir /path/to/bundle \
--input-list images.txt \
--output-dir outputs/
Key options:
--bundle-dir(optional): local MONAI bundle path,org/model,hf:org/model, or HF URL.--input(optional): one NIfTI file or a directory of NIfTI files.--input-list(optional): text file with one image path per line.- Use exactly one of
--inputor--input-list. --output-dir(required): output directory for predictions.--device(optional): inference device (for examplecuda,cuda:0,mps,cpu).--save-probs(optional): save probability output (*_prob.nii.gz) instead of segmentation (*_seg.nii.gz).--force(optional): recompute outputs even if prediction files already exist.--labels-dir(optional): labels directory used to compute Dice during inference.--reference-image(optional): image whose affine/header are used for saved outputs.
Device selection:
# CUDA
neurotk infer --device cuda --input image.nii.gz --output-dir outputs/
# Apple Silicon
neurotk infer --device mps --input image.nii.gz --output-dir outputs/
# CPU
neurotk infer --device cpu --input image.nii.gz --output-dir outputs/
If inference runs on CPU (explicitly or via fallback), NeuroTK prints a warning because runtime may be significantly slower.
Dice during inference:
neurotk infercomputes Dice and writesoutputs/dice_scores.csvonly when labels are available.- If
--labels-diris omitted and--inputis a directory, NeuroTK auto-detects sibling labels directories such asimages -> labelsandimagesTr -> labelsTr. - If labels are not present, Dice is skipped.
- If
--inputpath does not exist, inference fails fast with a clear error. - Existing prediction outputs are skipped by default; pass
--forceto recompute.
Dice after inference:
neurotk dice \
--preds outputs/ \
--labels-dir labels/ \
--output outputs/dice_scores.csv
Key options:
--preds(optional): one prediction NIfTI file or a directory of predictions.--preds-list(optional): text file with one prediction path per line.- Use exactly one of
--predsor--preds-list. --labels-dir(required): labels directory.--output(required): CSV output path for Dice/Hausdorff metrics.
Note: for full-bundle HF usage, the repo must contain a valid MONAI bundle layout (e.g., configs/ with inference/evaluate config and models/ checkpoints).
Output
NeuroTK emits a JSON report containing a dataset-level summary, per-file diagnostics, and explicit listings of detected issues. For validate+preprocess runs, the report includes a processed summary and preprocess traceability so original and processed states are unambiguous.
{
"summary": {"scope": "original_inputs", "num_images": 100, "files_with_issues": 7},
"summary_processed": {"scope": "processed_outputs", "num_images": 100},
"files": {"case_001.nii.gz": {"issues": ["label_missing"]}}
}
Validate vs preprocess semantics
summaryalways reflects original inputs.summary_processedis present only for validate+preprocess runs and reflects outputs after preprocessing.run_modeindicates whether preprocessing was requested.
Upgrading to v0.3.0
Reports now include explicit scope fields and preprocess traceability blocks. These additions are backward-compatible
for validation-only users.
Web UI
The FastAPI app in webapp/ is the primary landing page and execution interface. The older site/ Next.js prototype
is deprecated and should not be used for deployment.
Citation
If you use NeuroTK in your research, please cite it as follows:
@software{neurotk,
title = {NeuroTK: Dataset Validation for Neurology Brain Imaging},
author = {Sakshi Rathi},
year = {2026},
doi = {10.5281/zenodo.18252017},
url = {https://github.com/SakshiRa/neurotk},
note = {Open-source toolkit for dataset validation and quality assurance in neurology brain imaging}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file neurotk-0.3.2.tar.gz.
File metadata
- Download URL: neurotk-0.3.2.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7e7640feab57e2a39e1101c98401846de7bb4fdf1c02a0c3bb5287a782d906f0
|
|
| MD5 |
a5dd2377a136b52dff26d1b5b2a25cd3
|
|
| BLAKE2b-256 |
4af5876f2b08a324847305285eb00624079cae8605f9bc36f57a0147d667ae64
|
Provenance
The following attestation bundles were made for neurotk-0.3.2.tar.gz:
Publisher:
python-publish.yml on SakshiRa/neurotk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neurotk-0.3.2.tar.gz -
Subject digest:
7e7640feab57e2a39e1101c98401846de7bb4fdf1c02a0c3bb5287a782d906f0 - Sigstore transparency entry: 927290955
- Sigstore integration time:
-
Permalink:
SakshiRa/neurotk@bec96c57e03c582cb62fb5ac98e396dd2c3af9f8 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/SakshiRa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@bec96c57e03c582cb62fb5ac98e396dd2c3af9f8 -
Trigger Event:
release
-
Statement type:
File details
Details for the file neurotk-0.3.2-py3-none-any.whl.
File metadata
- Download URL: neurotk-0.3.2-py3-none-any.whl
- Upload date:
- Size: 42.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1075020a06e4faae1b2cd2a67831dde2c0d21114ed452d4197946a1eabcf5bdc
|
|
| MD5 |
b1c071bebb126cb254084c2a38a0a84b
|
|
| BLAKE2b-256 |
659e0e7cb1547c49c11eb7a0d759b95f77654bb2b70a43edd4a2d075dfe15557
|
Provenance
The following attestation bundles were made for neurotk-0.3.2-py3-none-any.whl:
Publisher:
python-publish.yml on SakshiRa/neurotk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
neurotk-0.3.2-py3-none-any.whl -
Subject digest:
1075020a06e4faae1b2cd2a67831dde2c0d21114ed452d4197946a1eabcf5bdc - Sigstore transparency entry: 927290958
- Sigstore integration time:
-
Permalink:
SakshiRa/neurotk@bec96c57e03c582cb62fb5ac98e396dd2c3af9f8 -
Branch / Tag:
refs/tags/v0.3.2 - Owner: https://github.com/SakshiRa
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@bec96c57e03c582cb62fb5ac98e396dd2c3af9f8 -
Trigger Event:
release
-
Statement type: