Skip to main content

ZACH-ViT: compact permutation-invariant Vision Transformer (MedMNIST v3.0.2 edition, arXiv:2602.17929). Includes legacy SSDA lung ultrasound pipeline.

Project description

PyPI version arXiv Project status

ZACH-ViT (MedMNIST Edition): Regime-Dependent Inductive Bias in Compact Vision Transformers

New arXiv preprint (Feb 2026): ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging
➡️ arXiv: 2602.17929 (primary reference for this repository)
➡️ Code: this repository

What this repo provides (v2 / MedMNIST):

  • ZACH-ViT: positional-embedding-free, [CLS]-free compact ViT (~0.25M params)
  • Regime-spectrum evaluation across 7 MedMNIST datasets (few-shot protocol)
  • Baselines + efficiency analysis (params / disk footprint / inference time)

ZACH-ViT should be interpreted less as a lightweight alternative to standard ViTs and more as an architectural probe for studying inductive-bias alignment under varying spatial-structure regimes.


Citation (preferred)

If you use this code, please cite the MedMNIST paper:

@article{angelakis2026zachvit,
  title={ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging},
  author={Angelakis, Athanasios},
  journal={arXiv preprint arXiv:2602.17929},
  year={2026}
}

⚠️ Historical note:
The sections below describe the earlier lung ultrasound pipeline and ShuffleStrides Data Augmentation (SSDA), which represent the original exploratory version of ZACH-ViT. The current canonical validation and conclusions are reported in arXiv:2602.17929.


🧩 Legacy Pipeline: Lung Ultrasound + SSDA (Exploratory Version)

Official implementation of ZACH-ViT, a lightweight Vision Transformer for robust classification of lung ultrasound videos, and the ShuffleStrides Data Augmentation (SSDA) algorithm.

Introduced in Angelakis et al., "ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification", (arXiv:2510.17650).


📘 Overview

ZACH-ViT redefines Vision Transformer design for small, heterogeneous medical datasets.

  • No positional embeddings or class tokens — zero-token paradigm for order-agnostic feature extraction
  • ⚙️ Adaptive hierarchical residuals for stable feature learning
  • 🌍 Global pooling for invariant image-level representations
  • 🔄 ShuffleStrides Data Augmentation (SSDA) — permutation-based semi-supervised augmentation preserving clinical plausibility

🧠 Full Pipeline

This repository provides a fully reproducible pipeline for preprocessing, training, and evaluation, available as both Jupyter notebooks and pure Python scripts:

  1. ROI extraction from raw TALOS DICOM ultrasound recordings
  2. VIS (Video Image Sequence) creation per patient, concatenating frame strides from all probe positions
  3. ShuffleStrides semi-supervised data augmentation (0-SSDA) for robust domain generalization
  4. ShuffleStrides semi-supervised data augmentation (SSDA_p) for permutation-based learning enhancement
  5. ZACH-ViT training, validation, and testing with automatic time and metric reporting

📂 Data Directory Structure

The ../Data directory evolves from raw patient data to fully structured training datasets.

🧩 Before Preprocessing

../Data/
├── TALOS100/
└── TALOS122/

Description:

  • Each folder contains the raw ultrasound recordings (.dcm format) for one patient across the four transducer positions
  • Data is stored in DICOM format, which is standard for medical imaging

🔄 After Preprocessing

../Data/
├── 0_SSDA/             # Dataset with all 4! stride permutations (first SSDA regime)
├── 2_3_SSDA/           # Second-level SSDA with partial stride reordering
├── imgs/               # Auto-saved training and validation plots (timestamped)
├── Processed_ROI/      # Extracted pleural ROI frames per position
├── TALOS100/           # Original raw DICOMs (kept for reference)
├── TALOS122/           # Original raw DICOMs (kept for reference)
├── VIS/                # Generated VIS images per patient (concatenated stride representation)
├── train/
│   ├── 0/              # Non-CPE   └── 1/              # CPE
├── val/
│   ├── 0/
│   └── 1/
└── test/
    ├── 0/
    └── 1/

🧠 Notes

  • VIS images represent one patient by vertically stacking the four position-specific stride sequences.
  • SSDA folders contain automatically generated semi-supervised augmentations.
  • train, val, and test directories follow the standard Keras ImageDataGenerator convention with subfolders 0 and 1 for binary classes.
  • All training curves from the ZACH-ViT notebook are automatically saved in ../Data/imgs/ with a date-time prefix (e.g., ZACH_ViT_training_20251014_183502.png).

⚙️ Installation

ZACH-ViT provides both Jupyter notebook and Command-Line Interface (CLI) execution for full reproducibility.

📓 Using Jupyter Notebooks

  1. Run Preprocessing Open and run the notebook: Preprocessing_ROI_VIS_0_SSDA_SSDA_p.

    This will:

    • Extract and crop the DICOM ROIs
    • Generate VIS images
    • Create 0-SSDA and SSDA_p datasets
  2. Train and evaluate ZACH-ViT Open and run the notebook: ZACH-ViT.

    This will:

    • Train the model
    • Report training/inference times
    • Save learning curves automatically in ../Data/imgs/

💻 Using CLI

# Clone the repository
git clone https://github.com/Bluesman79/ZACH-ViT.git
cd ZACH-ViT

# (Optional) Create a clean virtual environment
python -m venv venv
source venv/bin/activate   # On Windows: venv\Scripts\activate

# Install in editable/development mode
pip install -e .

# Verify installation
python -c 'import zachvit; print("✅ ZACH-ViT installed successfully!")'

This installs two CLI tools globally in the environment:

  • zachvit-preprocess: runs the entire preprocessing and data augmentation pipeline
  • zachvit-train: runs training and evaluation of the ZACH-ViT model

🧩 CLI Usage

🧠 Preprocessing Pipeline

The preprocessing CLI zachvit-preprocess automatically runs all four modules:

  1. ROI extraction and height compression
  2. VIS (Video Image Sequence) creation
  3. 0-SSDA (stride permutation augmentation)
  4. SSDAₚ (semi-supervised prime-based augmentation)

Example

zachvit-preprocess \
  --talos_path ../Data/TALOS \
  --output_dir ../Data \
  --patient_start 100 \
  --patient_end 122 \
  --primes 2 3
Argument Description
--talos_path Path to folder containing raw TALOS DICOM patient directories (TALOS100/, TALOS122/, etc.)
--output_dir Base directory where all processed data will be saved (../Data/)
--patient_start Starting patient ID (inclusive)
--patient_end Ending patient ID (inclusive)
--primes (Optional) Prime numbers for SSDAₚ augmentation seeds — default: 2 3

The CLI will automatically generate:

../Data/
├── Processed_ROI/
├── VIS/
├── 0_SSDA/
├── 2_3_SSDA/
└── imgs/           # Training curves and logs

🧩 Training ZACH-ViT

The training CLI zachvit-train runs end-to-end training, validation, and testing of ZACH-ViT on the prepared datasets. It also reports total training time, mean inference time per batch, and saves ROC-AUC/accuracy curves automatically.

Example

zachvit-train \
  --base_dir ../Data \
  --epochs 23 \
  --batch_size 16 \
  --threshold 53
  --class_weights 1.0 2.5
Argument Description
--base_dir Root data directory containing train/, val/, and test/
--epochs Number of training epochs (default: 23)
--batch_size Batch size for training (default: 16)
--threshold Intensity threshold (0–255) for background removal (default: 53)
--class_weights Optional class weights for labels 0 and 1 (e.g. --class_weights 1.0 2.5)

📊 Output

After training:

  • All performance plots (loss, accuracy, AUC) are saved in ../Data/imgs/
  • Model metrics (AUC, sensitivity, specificity, F1-score) are printed at the end
  • Inference time (validation/test) and average epoch duration are reported

💡 Example Workflow

# Step 1: Run preprocessing
zachvit-preprocess --talos_path ../Data/TALOS --output_dir ../Data --patient_start 100 --patient_end 122 --primes 2 3

# Step 2: Train and evaluate ZACH-ViT
zachvit-train --base_dir ../Data --epochs 23 --batch_size 16 --threshold 53 --class_weights 1.0 2.5

Both scripts mirror the logic of the notebooks and save identical output structures.

🔁 Data Flow Overview

TALOS DICOM
      ▼
ROI Extraction
      ▼
VIS Image Generation
      ▼
ShuffleStrides Data Augmentation (SSDA)
      ▼
Train / Val / Test Sets
      ▼
ZACH-ViT Training and Evaluation

🧾 Citation (legacy exploratory manuscript)

@article{angelakis2025zachvit,
  author    = {Angelakis, A. et al.},
  title     = {ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification},
  journal   = {arXiv preprint arXiv:2510.17650},
  year      = {2025},
  doi       = {https://doi.org/10.48550/arXiv.2510.17650},
  url       = {https://arxiv.org/abs/2510.17650}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zachvit-1.1.4.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

zachvit-1.1.4-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file zachvit-1.1.4.tar.gz.

File metadata

  • Download URL: zachvit-1.1.4.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for zachvit-1.1.4.tar.gz
Algorithm Hash digest
SHA256 0b25a67a2ce5c6b8095e49b318c848ef2b3cf488b156ccdf8867909694a2be7f
MD5 a659bf20301a09953d94f90800ccb584
BLAKE2b-256 cac5b6ce015705318ff6cc1619f26d484cd42932d181bdb481edd906ae883036

See more details on using hashes here.

File details

Details for the file zachvit-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: zachvit-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.16

File hashes

Hashes for zachvit-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 bab05a8ae135d5c7b23cbd56f5fad4e1c707edbb61f5d8c81ce5b840efb1e941
MD5 d7a1fd9b491435bffe5e49cbcf2bd7f4
BLAKE2b-256 1cacb6b72cd10552a9e21db3213077baea8bf371badef5556fbce3ff814bc669

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page