Skip to main content

Medscan with CUDA 11.6 (PyTorch GPU support)

Project description

MedScan – End‑to‑End Medical Imaging Training Pipeline

MedScan is a self‑contained Python package that lets you pre‑process and augment data, train deep‑learning models, evaluate with plots/metrics, and save / reload checkpoints – all from one intuitive API or a single command‑line call.


Table of Contents

  1. Key Features
  2. Repository Layout
  3. Installation
  4. Quick‑Start Notebook
  5. Command‑Line Usage
  6. Prediction CLI
  7. API Overview
  8. Training Pipeline Walk‑Through
  9. Saving, Loading & Plot Outputs
  10. Troubleshooting
  11. License

Key Features

  • One‑line data split with Data.split() (stratified, group‑aware, or plain).

  • Config‑driven augmentation, balancing, masking & context‐feature handling via PreprocessConfig.

  • Flexible training:

    • Multi‑head model (shared backbone, one head per target) or
    • Single‑head‑per‑label models (optionally across multiple backbones).
  • Torch AMP support (mixed precision) & automatic GPU/CPU selection.

  • Automatic early stopping per head or per model.

  • Rich evaluation with confusion matrices, loss & LR curves, AUC/Accuracy/F1.

  • Full CLI runner (train.py) – reproduce your notebook runs head‑less.

  • Saved plots are dropped into a plots/ folder automatically when using the CLI.


Repository Layout

medscan/                  # Package root
├── __init__.py           # Exposes Data, PreprocessConfig, TrainConfig, Pipeline
├── config.py             # @dataclass configs used throughout
├── data.py               # Data.split utility (train/val/test)
├── pipeline.py           # Core training/inference/evaluation pipeline
└── transform.py          # Augmentation & class‑balancing logic
examples/                 # Example notebooks + sample CSV
  └── merged_dataframe.csv
train.py                  # CLI interface wrapping the Pipeline
README.md                 # (this file)

Installation

Prerequisites: Python ≥ 3.9

Target Command
CPU (default) pip install medscan
cu116 build pip install medscan-cu116
cu117 build pip install medscan-cu117
cu118 build pip install medscan-cu118
cu121 build pip install medscan-cu121
cu124 build pip install medscan-cu124

Pick exactly one line that matches your CUDA toolkit (or none for CPU). No extra wheels needed — each tag bundles the correct Torch + torchvision wheels.

Environment Requirements file
CPU (default) pip install -r requirements_cpu.txt
cu116 pip install -r requirements_cu116.txt
cu117 pip install -r requirements_cu117.txt
cu118 pip install -r requirements_cu118.txt
cu121 pip install -r requirements_cu121.txt
cu124 pip install -r requirements_cu124.txt

Tip – if you already have PyTorch installed, make sure the wheel versions match the list above.


Quick‑Start Notebook

Open examples/medscan_quickstart.ipynb and follow the annotated steps — the only manual preparation is to create a single DataFrame that already contains img_path plus all target columns. Everything afterwards (split, augment, train, evaluate) is automated by the pipeline.

import pandas as pd
from medscan import Pipeline, Data, PreprocessConfig, TrainConfig

# 1️⃣  Prepare your own merged DataFrame -> df_merged (img_path + labels)
# 2️⃣  Split, configure, train, evaluate — see the notebook for details
# ... see the full notebook in /examples for detailed comments

The notebook in examples shows how to build a toy df_merged, configure preprocessing/training, train for one epoch, evaluate, and save the best model.


Command‑Line Usage

train.py wraps every step so you can train from the shell – no notebook needed.

Minimal run

python train.py \
  --data_path "path/to/merged_dataframe.csv"

This uses all defaults: CPU, 70 / 15 / 15 train‑val‑test split, no augmentation, single resnet34 backbone, multi‑head mode, 10 epochs, and saves plots to ./plots/.

Full run (every flag)

python train.py \
  --data_path "path/to/merged_dataframe.csv" \
  --filter_column "Neuro_Imaging=1" \
  --filter_column "Hemisphere=0" \
  # (repeat --filter_column to apply multiple conditions)
  --targets "Neuro_Imaging,Motion_Artefact,Skull_Visibility,Projection,Contrast_fluid,DSA,Hemisphere,ICA_Top_visible,MCA_visible" \
  --train_frac 0.7  \
  --val_frac 0.15  \
  --seed 123 \
  --augment  \
  --augment_factor 2  \
  --balance_on \
  --augmented_image_path augmented_images \
  --elastic_alpha 34.0  \
  --elastic_sigma 4.0 \
  --contrast_min 0.4 \
  --contrast_max 0.9 \
  --input_size 224 \
  --batch_size 32 \
  --epochs 10 \
  --early_stopping_patience 3 \
  --learning_rate 0.001 \
  --optimizer AdamW \
  --dropout \
  --dropout_rate 0.5 \
  --mixed_precision \
  --save_best_model \
  --checkpoint_dir checkpoints \
  --metric val_loss \
  --metric_mode min \
  --confidence_score \
  --pretrained_models "resnet34,resnet50" \
  --train_per_label \
  --force_cpu \
  --save_model_path best_model.pt \
  --plots "confusion_matrix,loss_vs_epoch,lr_vs_epoch" \
  --metrics "AUC,accuracy,F1"

All arguments are documented via python train.py -h.

Plots are saved to ./plots/plot_*.png (auto‑created).

Augmented images are saved under the folder you specify in --augmented_image_path (default: augmented_images/).


Prediction CLI

predict.py lets you run inference on a folder of images using a saved pipeline directory or checkpoint.

python medscan/predict.py \
  --img_dir path/to/image_folder \
  --model_path path/to/pipeline_dir \
  --labels all \
  --output_csv predictions.csv \
  --force_cpu

Use --labels to restrict targets, --confidence to output probability scores, and omit --force_cpu to use a GPU if available.


API Overview

Class / Function Role
Data.split Stratified (or group‑aware) train/val/test split in one call.
PreprocessConfig Declarative augmentation & balance settings.
TrainConfig Training hyper‑parameters, device, backbone list, etc.
Pipeline Orchestrates training, prediction, evaluation, save/load.
transform.augment_and_balance Internal helper to expand/upsample data.

All objects live under medscan and are re‑exported via __init__.py for convenience:

from medscan import Data, PreprocessConfig, TrainConfig, Pipeline

Training Pipeline Walk‑Through

  1. Provide a merged DataFrame — user‑supplied, must include img_path and all label columns.

  2. Split into train / val / test with Data.split() (ensures each class appears in every subset).

  3. Preprocess (PreprocessConfig)

    • Optional augmentation: elastic, contrast, contrast + elastic.
    • Optional class balancing (upsampling) – requires augment=True.
    • Optional mask handling & context features.
  4. Model training (Pipeline.fit)

    • Multi‑head (default): one backbone + multiple classification heads.
    • Per‑label: one backbone per target (single head each).
    • Early stopping is tracked individually per head.
  5. Prediction (Pipeline.predict): adds Label_<target> columns.

  6. Evaluation (Pipeline.evaluate): calculates metrics & shows / saves plots.

  7. Save / Load (Pipeline.save / Pipeline.load): preserves all weights and head mapping.


Saving, Loading & Plot Outputs

  • Saving: model.save("best_model.pt") stores either a single multi‑head state or a dict of per‑label states.
  • Loading: supply the same PreprocessConfig / TrainConfig (device can differ) and call model.load().
  • Plots: when run via the CLI, every plt.show() call is monkey‑patched to dump PNGs under plots/. In notebooks they still display inline.

Troubleshooting

Issue Fix
CUDA library not found Install a matching requirements_cuXXX.txt wheel or pass --force_cpu.
Class missing in test/val Lower train_frac / val_frac or disable require_all_classes.
No plots on server Use the CLI; plots are saved to disk instead of shown.
OOM on GPU Reduce --batch_size, --input_size, or train on CPU.

License

This project is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medscan_cu116-0.1.2.tar.gz (35.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscan_cu116-0.1.2-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file medscan_cu116-0.1.2.tar.gz.

File metadata

  • Download URL: medscan_cu116-0.1.2.tar.gz
  • Upload date:
  • Size: 35.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for medscan_cu116-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1fe126b421a909c35bcf43db5a8e5a46c649af50708fd882f595483236f3eada
MD5 1a1851a1019ebe21f1af5649363309b4
BLAKE2b-256 632ac58605dd6c067d8e86cff88ede6c047fba665290ecb34419fa2794d2ab67

See more details on using hashes here.

File details

Details for the file medscan_cu116-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: medscan_cu116-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.3

File hashes

Hashes for medscan_cu116-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 322a44a37e00c0932fd26e22f2f0fa0c4c4760d5d1b164028b3d485617c78e0f
MD5 1fd1cfb5d90057900c98705aefcf249f
BLAKE2b-256 c0f2bccc13709358a340368ea3c11991bbfc36092c28081c06c676982534e084

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page