Skip to main content

Medscan with CUDA 11.8 (PyTorch GPU support)

Project description

MedScan – End‑to‑End Medical Imaging Training Pipeline

MedScan is a self‑contained Python package that lets you pre‑process and augment data, train deep‑learning models, evaluate with plots/metrics, and save / reload checkpoints – all from one intuitive API or a single command‑line call.


Table of Contents

  1. Key Features
  2. Repository Layout
  3. Installation
  4. Quick‑Start Notebook
  5. Command‑Line Usage
  6. API Overview
  7. Training Pipeline Walk‑Through
  8. Saving, Loading & Plot Outputs
  9. Troubleshooting
  10. License

Key Features

  • One‑line data split with Data.split() (stratified, group‑aware, or plain).

  • Config‑driven augmentation, balancing, masking & context‐feature handling via PreprocessConfig.

  • Flexible training:

    • Multi‑head model (shared backbone, one head per target) or
    • Single‑head‑per‑label models (optionally across multiple backbones).
  • Torch AMP support (mixed precision) & automatic GPU/CPU selection.

  • Automatic early stopping per head or per model.

  • Rich evaluation with confusion matrices, loss & LR curves, AUC/Accuracy/F1.

  • Full CLI runner (train.py) – reproduce your notebook runs head‑less.

  • Saved plots are dropped into a plots/ folder automatically when using the CLI.


Repository Layout

medscan/                  # Package root
├── __init__.py           # Exposes Data, PreprocessConfig, TrainConfig, Pipeline
├── config.py             # @dataclass configs used throughout
├── data.py               # Data.split utility (train/val/test)
├── pipeline.py           # Core training/inference/evaluation pipeline
└── transform.py          # Augmentation & class‑balancing logic
examples/                 # Example notebooks + sample CSV
  └── merged_dataframe.csv
train.py                  # CLI interface wrapping the Pipeline
README.md                 # (this file)

Installation

Prerequisites: Python ≥ 3.9

Target Command
CPU (default) pip install medscan
cu116 build pip install medscan-cu116
cu117 build pip install medscan-cu117
cu118 build pip install medscan-cu118
cu121 build pip install medscan-cu121
cu124 build pip install medscan-cu124

Pick exactly one line that matches your CUDA toolkit (or none for CPU). No extra wheels needed — each tag bundles the correct Torch + torchvision wheels.

---------------------------|-----------------| | CPU (default) | pip install -r requirements_cpu.txt | | cu116 | pip install -r requirements_cu116.txt | | cu117 | pip install -r requirements_cu117.txt | | cu118 | pip install -r requirements_cu118.txt | | cu121 | pip install -r requirements_cu121.txt | | cu124 | pip install -r requirements_cu124.txt |

Tip – if you already have PyTorch installed, make sure the wheel versions match the list above.


Quick‑Start Notebook

Open examples/medscan_quickstart.ipynb and follow the annotated steps — the only manual preparation is to create a single DataFrame that already contains img_path plus all target columns. Everything afterwards (split, augment, train, evaluate) is automated by the pipeline.

import pandas as pd
from medscan import Pipeline, Data, PreprocessConfig, TrainConfig

# 1️⃣  Prepare your own merged DataFrame -> df_merged (img_path + labels)
# 2️⃣  Split, configure, train, evaluate — see the notebook for details
# ... see the full notebook in /examples for detailed comments

The notebook in examples shows how to build a toy df_merged, configure preprocessing/training, train for one epoch, evaluate, and save the best model.


Command‑Line Usage

train.py wraps every step so you can train from the shell – no notebook needed.

Minimal run

python /home/medscan/train.py \
  --data_path "/home/medscan/merged_dataframe.csv"

This uses all defaults: CPU, 70 / 15 / 15 train‑val‑test split, no augmentation, single resnet34 backbone, multi‑head mode, 10 epochs, and saves plots to ./plots/.

Full run (every flag)

python /home/medscan/train.py \
  --data_path "/home/medscan/merged_dataframe.csv" \
  --filter_column "Neuro_Imaging=1" \ # you can add more filters such as the line below
  --filter_column "Hemisphere=0" \
  --targets "Neuro_Imaging,Motion_Artefact,Skull_Visibility,Projection,Contrast_fluid,DSA,Hemisphere,ICA_Top_visible,MCA_visible" \
  --train_frac 0.7  \
  --val_frac 0.15  \
  --seed 123 \
  --augment  \
  --augment_factor 2  \
  --balance_on \
  --augmented_image_path augmented_images \
  --elastic_alpha 34.0  \
  --elastic_sigma 4.0 \
  --contrast_min 0.4 \
  --contrast_max 0.9 \
  --input_size 224 \
  --batch_size 32 \
  --epochs 10 \
  --early_stopping_patience 3 \
  --learning_rate 0.001 \
  --optimizer Adam \ 
  --dropout \
  --dropout_rate 0.5 \
  --mixed_precision \
  --save_best_model \
  --checkpoint_dir checkpoints \
  --metric val_loss \
  --metric_mode min \
  --confidence_score \
  --pretrained_models "resnet34,resnet50" \
  --train_per_label \
  --force_cpu \
  --save_model_path best_model.pt \
  --plots "confusion_matrix,loss_vs_epoch,lr_vs_epoch" \
  --metrics "AUC,accuracy,F1"

All arguments are documented via python train.py -h.

Plots are saved to ./plots/plot_*.png (auto‑created).

Augmented images are saved under the folder you specify in --augmented_image_path (default: augmented_images/).


API Overview

Class / Function Role
Data.split Stratified (or group‑aware) train/val/test split in one call.
PreprocessConfig Declarative augmentation & balance settings.
TrainConfig Training hyper‑parameters, device, backbone list, etc.
Pipeline Orchestrates training, prediction, evaluation, save/load.
transform.augment_and_balance Internal helper to expand/upsample data.

All objects live under medscan and are re‑exported via __init__.py for convenience:

from medscan import Data, PreprocessConfig, TrainConfig, Pipeline

Training Pipeline Walk‑Through

  1. Provide a merged DataFrame — user‑supplied, must include img_path and all label columns.

  2. Split into train / val / test with Data.split() (ensures each class appears in every subset).

  3. Preprocess (PreprocessConfig)

    • Optional augmentation: elastic, contrast, contrast + elastic.
    • Optional class balancing (upsampling) – requires augment=True.
    • Optional mask handling & context features.
  4. Model training (Pipeline.fit)

    • Multi‑head (default): one backbone + multiple classification heads.
    • Per‑label: one backbone per target (single head each).
    • Early stopping is tracked individually per head.
  5. Prediction (Pipeline.predict): adds Label_<target> columns.

  6. Evaluation (Pipeline.evaluate): calculates metrics & shows / saves plots.

  7. Save / Load (Pipeline.save / Pipeline.load): preserves all weights and head mapping.


Saving, Loading & Plot Outputs

  • Saving: model.save("best_model.pt") stores either a single multi‑head state or a dict of per‑label states.
  • Loading: supply the same PreprocessConfig / TrainConfig (device can differ) and call model.load().
  • Plots: when run via the CLI, every plt.show() call is monkey‑patched to dump PNGs under plots/. In notebooks they still display inline.

Troubleshooting

Issue Fix
CUDA library not found Install a matching requirements_cuXXX.txt wheel or pass --force_cpu.
Class missing in test/val Lower train_frac / val_frac or disable require_all_classes.
No plots on server Use the CLI; plots are saved to disk instead of shown.
OOM on GPU Reduce --batch_size, --input_size, or train on CPU.

License

This project is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medscan_cu118-0.1.0-py3-none-any.whl (68.3 kB view details)

Uploaded Python 3

File details

Details for the file medscan_cu118-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: medscan_cu118-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 68.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for medscan_cu118-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c779185ba98825e83780cc911750ee0b696590c6865c1b255032a76302b301d
MD5 8070be138538d938c6bf74c7545e3126
BLAKE2b-256 a0ed16aec401346b2c7cad9447520430f9d3cb7b0bcadbc713ed1e95138e218c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page