Skip to main content

Dataset validation and preprocessing toolkit for neurology brain imaging (NIfTI)

Project description

DOI

NeuroTK: Dataset Validation for Neurology Brain Imaging

Motivation

Neurology brain imaging datasets are heterogeneous and frequently contain inconsistencies. Geometry, spacing, orientation, and annotation issues occur commonly across CT and MRI collections. These problems often surface late in modeling, when remediation is costly and compromises reproducibility. NeuroTK surfaces issues early, explicitly, and reproducibly to support dataset hygiene prior to analysis.

Scope

NeuroTK focuses on dataset quality assurance prior to downstream analysis. It provides dataset-level and file-level validation with structural and geometric consistency checks, and assessment of annotation presence and integrity.

  • Dataset-level and file-level validation
  • Structural and geometric consistency checks
  • Annotation presence and integrity assessment

NeuroTK does not modify scientific data.

Installation

pip install neurotk

Quickstart

neurotk validate --images imagesTr --labels labelsTr --out report.json

Inputs are expected as flat directories of NIfTI files, and filenames must match exactly for image–label pairing.

dataset/
  imagesTr/
    case_001.nii.gz
    case_002.nii.gz
  labelsTr/
    case_001.nii.gz
    case_002.nii.gz

Inference (MONAI bundles)

NeuroTK can run inference from external MONAI bundles via the optional inference extras:

pip install neurotk[inference]

Single image:

neurotk infer \
  --bundle-dir /path/to/bundle \
  --input image.nii.gz \
  --output-dir outputs/

Default bundle (uses NEUROTK_DEFAULT_BUNDLE or UMNSHAMLAB/segresnet):

neurotk infer \
  --input image.nii.gz \
  --output-dir outputs/

Default HF bundle repo: UMNSHAMLAB/segresnet.

From Hugging Face (auto-download + cache full bundle):

neurotk infer \
  --bundle-dir hf:UMNSHAMLAB/segresnet \
  --input image.nii.gz \
  --output-dir outputs/

You can also pass a Hugging Face repo URL:

neurotk infer \
  --bundle-dir https://huggingface.co/UMNSHAMLAB/segresnet \
  --input image.nii.gz \
  --output-dir outputs/

Batch mode:

neurotk infer \
  --bundle-dir /path/to/bundle \
  --input-list images.txt \
  --output-dir outputs/

Dice after inference:

neurotk dice \
  --preds outputs/ \
  --labels-dir labels/ \
  --output outputs/dice_scores.csv

Note: for full-bundle HF usage, the repo must contain a valid MONAI bundle layout (e.g., configs/ with inference/evaluate config and models/ checkpoints).

Output

NeuroTK emits a JSON report containing a dataset-level summary, per-file diagnostics, and explicit listings of detected issues. For validate+preprocess runs, the report includes a processed summary and preprocess traceability so original and processed states are unambiguous.

{
  "summary": {"scope": "original_inputs", "num_images": 100, "files_with_issues": 7},
  "summary_processed": {"scope": "processed_outputs", "num_images": 100},
  "files": {"case_001.nii.gz": {"issues": ["label_missing"]}}
}

Validate vs preprocess semantics

  • summary always reflects original inputs.
  • summary_processed is present only for validate+preprocess runs and reflects outputs after preprocessing.
  • run_mode indicates whether preprocessing was requested.

Upgrading to v0.3.0

Reports now include explicit scope fields and preprocess traceability blocks. These additions are backward-compatible for validation-only users.

Web UI

The FastAPI app in webapp/ is the primary landing page and execution interface. The older site/ Next.js prototype is deprecated and should not be used for deployment.

Citation

If you use NeuroTK in your research, please cite it as follows:

@software{neurotk,
  title  = {NeuroTK: Dataset Validation for Neurology Brain Imaging},
  author = {Sakshi Rathi},
  year   = {2026},
  doi    = {10.5281/zenodo.18252017},
  url    = {https://github.com/SakshiRa/neurotk},
  note   = {Open-source toolkit for dataset validation and quality assurance in neurology brain imaging}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

neurotk-0.3.1.tar.gz (40.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

neurotk-0.3.1-py3-none-any.whl (39.6 kB view details)

Uploaded Python 3

File details

Details for the file neurotk-0.3.1.tar.gz.

File metadata

  • Download URL: neurotk-0.3.1.tar.gz
  • Upload date:
  • Size: 40.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for neurotk-0.3.1.tar.gz
Algorithm Hash digest
SHA256 7d5b31b5d426d651f0c08772ce8adf21c81f8a1d5bfcbe675c951cf744300fd6
MD5 52bb3878c854a9275b1f11e0eb594b6e
BLAKE2b-256 a9126d78740250e71daec59f8c25ac14dfe2135d711b09bf96a3735a4e412ea8

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurotk-0.3.1.tar.gz:

Publisher: python-publish.yml on SakshiRa/neurotk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file neurotk-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: neurotk-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 39.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for neurotk-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 10792c42f4c443c97edc81719d13d011aabea65b12094ae7c5b0b0eb3e5b3330
MD5 f1f646b91d7205daf250cc25ccc6bf51
BLAKE2b-256 a31cde566b3c30634111211c30400dcaf64f344db3ea25f8c0314b599302606b

See more details on using hashes here.

Provenance

The following attestation bundles were made for neurotk-0.3.1-py3-none-any.whl:

Publisher: python-publish.yml on SakshiRa/neurotk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page