Modular experimentation framework for computational pathology

These details have not been verified by PyPI

Project links

Project description

soma

soma is a modular framework to streamline computational pathology research.

It provides a unified API to go from a dataset of slides and labels to a full, reproducible result report. Along the way, it makes it easy to sweep core design choices such as preprocessing (spacing, field-of-view), encoding (foundation models), and aggregation (MIL) so you can quickly find the strongest configuration for your data.

You can use it either as a full end-to-end pipeline or as a set of composable building blocks for custom experiment orchestration.

Install

pip install soma-pathology

The PyPI distribution is soma-pathology; the import package and CLI remain soma.

API Overview

The package root exports the main entry points:

Dataset and Splits for loading data
FeatureExtractor for preprocessing slides and extracting embeddings
train() and train_one_fold() for training directly from features
Pipeline for the full preprocessing + feature extraction + training workflow

Quick Start

1. Prepare dataset and splits

dataset.csv should contain one row per slide with at least sample_id, image_path, and label. sample_id must be unique, image_path should point to the slide file, and label can be either a string class name or an integer target.

splits.csv should assign each sample_id to train, tune, or a test* split for every fold. Each fold must contain at least one test split. This is what keeps evaluation reproducible and prevents leakage.

from soma import Dataset, Splits

dataset = Dataset("dataset.csv")
splits = Splits("splits.csv", dataset)

print(len(dataset.sample_ids))
print(sorted({s.label for s in dataset.samples.values()}))
print(splits.num_folds)

2. Extract once, cache, and reuse features across experiments

FeatureExtractor handles preprocessing and embedding extraction. The cache lets you reuse the same extracted features across multiple training runs, which is especially useful when comparing several MIL aggregators or heads against the same encoder output.

from soma import Dataset, Splits, FeatureExtractor, train
from soma import CacheConfig, EncoderConfig, AggregatorConfig, TaskConfig, TrainingConfig

# Extract features once

dataset = Dataset("dataset.csv")
extractor = FeatureExtractor(
    dataset=dataset,
    encoder=EncoderConfig(name="uni2"),
    output_root="output",
    cache=CacheConfig(enabled=True, root_dir="shared/feature_cache"),
)

store = extractor.extract(feature_dir="output/features/uni2")

# Train multiple model variants on the same features

splits = Splits("splits.csv", dataset)
task = TaskConfig(name="binary_classification")

abmil_result = train(
    feature_store=store,
    dataset=dataset,
    splits=splits,
    aggregator=AggregatorConfig(name="abmil", params={"hidden_dim": 256}),
    task=task,
    training=TrainingConfig(learning_rate=1e-4, epochs=50),
    run_dir="output/abmil/uni2",
)

clam_result = train(
    feature_store=store,
    dataset=dataset,
    splits=splits,
    aggregator=AggregatorConfig(name="clam_sb", params={"hidden_dim": 256, "attn_dim": 128}),
    task=task,
    training=TrainingConfig(learning_rate=1e-4, epochs=50),
    run_dir="output/clam_sb/uni2",
)

3. Run a full pipeline in one call

Pipeline(config).run() handles preprocessing, feature extraction, training across folds, and metric aggregation in a single call.

from soma import Pipeline, PipelineConfig
from soma import EncoderConfig, AggregatorConfig, TaskConfig, TrainingConfig

config = PipelineConfig(
    dataset_csv="dataset.csv",
    splits_csv="splits.csv",
    output_root="output",
    dataset_type="slide",
    encoder=EncoderConfig(name="uni2"),
    aggregator=AggregatorConfig(name="abmil", params={"hidden_dim": 256}),
    task=TaskConfig(name="binary_classification"),
    training=TrainingConfig(learning_rate=1e-4, epochs=50),
)

result = Pipeline(config).run()

The returned PipelineResult includes:

fold_results: one entry per fold, each with training, tune, and test reports
summary: aggregated metrics across folds
run_dir: the resolved run directory containing the saved artifacts

CLI

soma ships a command-line interface that runs a full pipeline from a YAML config file:

soma /path/to/config.yaml
python -m soma /path/to/config.yaml

The YAML layout is grouped by concern: run, data, preprocessing, encoder, aggregation, task, evaluation, training, execution, cache, and reports. soma merges your file on top of the bundled soma/configs/default.yaml, so you usually only need to edit the blocks you want to change.

You can also inspect the available presets directly from the terminal:

soma list encoders --level tile
soma list aggregators
soma list tasks

examples/ contains a reference.yaml documenting every available field, and focused per-task starting points (slide_binary_classification.yaml, slide_ordinal_classification.yaml, slide_regression.yaml, tile_classification.yaml).

Docs

License

This repository is available under AGPL-3.0.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.4.0

Jun 16, 2026

1.3.0

Jun 12, 2026

1.2.0

Jun 11, 2026

1.1.3

Jun 8, 2026

1.1.2

Jun 4, 2026

1.1.1

May 30, 2026

1.1.0

May 28, 2026

1.0.1

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

soma_pathology-1.4.0.tar.gz (266.5 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

soma_pathology-1.4.0-py3-none-any.whl (319.7 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file soma_pathology-1.4.0.tar.gz.

File metadata

Download URL: soma_pathology-1.4.0.tar.gz
Upload date: Jun 16, 2026
Size: 266.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for soma_pathology-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`85b55ee6ba607bac9f1981da1a311950c9fc1ec70089e36568d8cf28a47df70b`
MD5	`2e37dda2c2cc589a714b471d4e6208ee`
BLAKE2b-256	`6aa23149385fc718613debb4ff1d60b626cff653738bdbb520411b85781d935f`

See more details on using hashes here.

File details

Details for the file soma_pathology-1.4.0-py3-none-any.whl.

File metadata

Download URL: soma_pathology-1.4.0-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 319.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for soma_pathology-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ab1b83a30428a9cdf002140d49f17b4c7536dff34bc280ba213f85a58a2aa557`
MD5	`0476505d6129a86fc1bbc71bd5c8dc2d`
BLAKE2b-256	`31c05f8f5efd44dc7adcbf3e142fb1e091a135b312621a5eeaf8456c9dda6984`

See more details on using hashes here.

soma-pathology 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

soma

Install

API Overview

Quick Start

1. Prepare dataset and splits

2. Extract once, cache, and reuse features across experiments

3. Run a full pipeline in one call

CLI

Docs

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes