ZACH-ViT: compact permutation-invariant Vision Transformer (MedMNIST v3.0.2 edition, arXiv:2602.17929). Includes legacy SSDA lung ultrasound pipeline.
Project description
ZACH-ViT (MedMNIST Edition): Regime-Dependent Inductive Bias in Compact Vision Transformers
New arXiv preprint (Feb 2026): ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging
➡️ arXiv: 2602.17929 (primary reference for this repository)
➡️ Code: this repository
What this repo provides (v2 / MedMNIST):
- ZACH-ViT: positional-embedding-free, [CLS]-free compact ViT (~0.25M params)
- Regime-spectrum evaluation across 7 MedMNIST datasets (few-shot protocol)
- Baselines + efficiency analysis (params / disk footprint / inference time)
ZACH-ViT should be interpreted less as a lightweight alternative to standard ViTs and more as an architectural probe for studying inductive-bias alignment under varying spatial-structure regimes.
Citation (preferred)
If you use this code, please cite the MedMNIST paper:
@article{angelakis2026zachvit,
title={ZACH-ViT: Regime-Dependent Inductive Bias in Compact Vision Transformers for Medical Imaging},
author={Angelakis, Athanasios},
journal={arXiv preprint arXiv:2602.17929},
year={2026}
}
⚠️ Historical note:
The sections below describe the earlier lung ultrasound pipeline and ShuffleStrides Data Augmentation (SSDA), which represent the original exploratory version of ZACH-ViT. The current canonical validation and conclusions are reported in arXiv:2602.17929.
🧩 Legacy Pipeline: Lung Ultrasound + SSDA (Exploratory Version)
Official implementation of ZACH-ViT, a lightweight Vision Transformer for robust classification of lung ultrasound videos, and the ShuffleStrides Data Augmentation (SSDA) algorithm.
Introduced in Angelakis et al., "ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification", (arXiv:2510.17650).
📘 Overview
ZACH-ViT redefines Vision Transformer design for small, heterogeneous medical datasets.
- ❌ No positional embeddings or class tokens — zero-token paradigm for order-agnostic feature extraction
- ⚙️ Adaptive hierarchical residuals for stable feature learning
- 🌍 Global pooling for invariant image-level representations
- 🔄 ShuffleStrides Data Augmentation (SSDA) — permutation-based semi-supervised augmentation preserving clinical plausibility
🧠 Full Pipeline
This repository provides a fully reproducible pipeline for preprocessing, training, and evaluation, available as both Jupyter notebooks and pure Python scripts:
- ROI extraction from raw TALOS DICOM ultrasound recordings
- VIS (Video Image Sequence) creation per patient, concatenating frame strides from all probe positions
- ShuffleStrides semi-supervised data augmentation (0-SSDA) for robust domain generalization
- ShuffleStrides semi-supervised data augmentation (SSDA_p) for permutation-based learning enhancement
- ZACH-ViT training, validation, and testing with automatic time and metric reporting
📂 Data Directory Structure
The ../Data directory evolves from raw patient data to fully structured training datasets.
🧩 Before Preprocessing
../Data/
├── TALOS100/
└── TALOS122/
Description:
- Each folder contains the raw ultrasound recordings (
.dcmformat) for one patient across the four transducer positions - Data is stored in DICOM format, which is standard for medical imaging
🔄 After Preprocessing
../Data/
├── 0_SSDA/ # Dataset with all 4! stride permutations (first SSDA regime)
├── 2_3_SSDA/ # Second-level SSDA with partial stride reordering
├── imgs/ # Auto-saved training and validation plots (timestamped)
├── Processed_ROI/ # Extracted pleural ROI frames per position
├── TALOS100/ # Original raw DICOMs (kept for reference)
├── TALOS122/ # Original raw DICOMs (kept for reference)
├── VIS/ # Generated VIS images per patient (concatenated stride representation)
├── train/
│ ├── 0/ # Non-CPE
│ └── 1/ # CPE
├── val/
│ ├── 0/
│ └── 1/
└── test/
├── 0/
└── 1/
🧠 Notes
- VIS images represent one patient by vertically stacking the four position-specific stride sequences.
- SSDA folders contain automatically generated semi-supervised augmentations.
- train, val, and test directories follow the standard Keras
ImageDataGeneratorconvention with subfolders0and1for binary classes. - All training curves from the ZACH-ViT notebook are automatically saved in
../Data/imgs/with a date-time prefix (e.g.,ZACH_ViT_training_20251014_183502.png).
⚙️ Installation
ZACH-ViT provides both Jupyter notebook and Command-Line Interface (CLI) execution for full reproducibility.
📓 Using Jupyter Notebooks
-
Run Preprocessing Open and run the notebook:
Preprocessing_ROI_VIS_0_SSDA_SSDA_p.This will:
- Extract and crop the DICOM ROIs
- Generate VIS images
- Create 0-SSDA and SSDA_p datasets
-
Train and evaluate ZACH-ViT Open and run the notebook:
ZACH-ViT.This will:
- Train the model
- Report training/inference times
- Save learning curves automatically in
../Data/imgs/
💻 Using CLI
# Clone the repository
git clone https://github.com/Bluesman79/ZACH-ViT.git
cd ZACH-ViT
# (Optional) Create a clean virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install in editable/development mode
pip install -e .
# Verify installation
python -c 'import zachvit; print("✅ ZACH-ViT installed successfully!")'
This installs two CLI tools globally in the environment:
zachvit-preprocess: runs the entire preprocessing and data augmentation pipelinezachvit-train: runs training and evaluation of the ZACH-ViT model
🧩 CLI Usage
🧠 Preprocessing Pipeline
The preprocessing CLI zachvit-preprocess automatically runs all four modules:
- ROI extraction and height compression
- VIS (Video Image Sequence) creation
- 0-SSDA (stride permutation augmentation)
- SSDAₚ (semi-supervised prime-based augmentation)
Example
zachvit-preprocess \
--talos_path ../Data/TALOS \
--output_dir ../Data \
--patient_start 100 \
--patient_end 122 \
--primes 2 3
| Argument | Description |
|---|---|
--talos_path |
Path to folder containing raw TALOS DICOM patient directories (TALOS100/, TALOS122/, etc.) |
--output_dir |
Base directory where all processed data will be saved (../Data/) |
--patient_start |
Starting patient ID (inclusive) |
--patient_end |
Ending patient ID (inclusive) |
--primes |
(Optional) Prime numbers for SSDAₚ augmentation seeds — default: 2 3 |
The CLI will automatically generate:
../Data/
├── Processed_ROI/
├── VIS/
├── 0_SSDA/
├── 2_3_SSDA/
└── imgs/ # Training curves and logs
🧩 Training ZACH-ViT
The training CLI zachvit-train runs end-to-end training, validation, and testing of ZACH-ViT on the prepared datasets.
It also reports total training time, mean inference time per batch, and saves ROC-AUC/accuracy curves automatically.
Example
zachvit-train \
--base_dir ../Data \
--epochs 23 \
--batch_size 16 \
--threshold 53
--class_weights 1.0 2.5
| Argument | Description |
|---|---|
--base_dir |
Root data directory containing train/, val/, and test/ |
--epochs |
Number of training epochs (default: 23) |
--batch_size |
Batch size for training (default: 16) |
--threshold |
Intensity threshold (0–255) for background removal (default: 53) |
--class_weights |
Optional class weights for labels 0 and 1 (e.g. --class_weights 1.0 2.5) |
📊 Output
After training:
- All performance plots (loss, accuracy, AUC) are saved in ../Data/imgs/
- Model metrics (AUC, sensitivity, specificity, F1-score) are printed at the end
- Inference time (validation/test) and average epoch duration are reported
💡 Example Workflow
# Step 1: Run preprocessing
zachvit-preprocess --talos_path ../Data/TALOS --output_dir ../Data --patient_start 100 --patient_end 122 --primes 2 3
# Step 2: Train and evaluate ZACH-ViT
zachvit-train --base_dir ../Data --epochs 23 --batch_size 16 --threshold 53 --class_weights 1.0 2.5
Both scripts mirror the logic of the notebooks and save identical output structures.
🔁 Data Flow Overview
TALOS DICOM
│
▼
ROI Extraction
│
▼
VIS Image Generation
│
▼
ShuffleStrides Data Augmentation (SSDA)
│
▼
Train / Val / Test Sets
│
▼
ZACH-ViT Training and Evaluation
🧾 Citation (legacy exploratory manuscript)
@article{angelakis2025zachvit,
author = {Angelakis, A. et al.},
title = {ZACH-ViT: A Zero-Token Vision Transformer with ShuffleStrides Data Augmentation for Robust Lung Ultrasound Classification},
journal = {arXiv preprint arXiv:2510.17650},
year = {2025},
doi = {https://doi.org/10.48550/arXiv.2510.17650},
url = {https://arxiv.org/abs/2510.17650}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zachvit-1.1.4.tar.gz.
File metadata
- Download URL: zachvit-1.1.4.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0b25a67a2ce5c6b8095e49b318c848ef2b3cf488b156ccdf8867909694a2be7f
|
|
| MD5 |
a659bf20301a09953d94f90800ccb584
|
|
| BLAKE2b-256 |
cac5b6ce015705318ff6cc1619f26d484cd42932d181bdb481edd906ae883036
|
File details
Details for the file zachvit-1.1.4-py3-none-any.whl.
File metadata
- Download URL: zachvit-1.1.4-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bab05a8ae135d5c7b23cbd56f5fad4e1c707edbb61f5d8c81ce5b840efb1e941
|
|
| MD5 |
d7a1fd9b491435bffe5e49cbcf2bd7f4
|
|
| BLAKE2b-256 |
1cacb6b72cd10552a9e21db3213077baea8bf371badef5556fbce3ff814bc669
|