Fast and accurate cell segmentation for single-molecule spatial omics (Stereo-seq)

These details have not been verified by PyPI

Project links

Project description

StereoSegger: Fast and Accurate Cell Segmentation for Stereo-seq

Note: This project is heavily inspired by the original Segger implementation by Elyas Heidari. You can find the original repository at EliHei2/segger_dev. This version is specifically optimized and refactored for Stereo-seq (SAW bin1) workflows.

Installation

StereoSegger requires CUDA 12 (specifically CUDA 12.4 compatibility) for GPU acceleration.

Quick Install (One-Liner)

pip install stereosegger --extra-index-url https://download.pytorch.org/whl/cu124 --extra-index-url https://pypi.nvidia.com

Option 1: Automated Setup (Recommended for HPC/Conda)

We provide a setup script that handles the complex dependency chain (PyTorch 2.5.1, RAPIDS 24.08, CUDA 12.4) automatically inside a clean Conda environment.

# Clone the repository
git clone https://github.com/nrclaudio/stereosegger.git
cd stereosegger

# Run the setup script (requires Conda)
bash scripts/setup_segger_env.sh

# Activate the environment
conda activate segger_env

Inputs & Outputs

StereoSegger operates on Parquet files. We provide a built-in command to convert raw Stereo-seq H5AD files into this format.

1. Raw Input (SAW Output)

Format: h5ad (AnnData)
Source: Output from the SAW pipeline (Stereo-seq Analysis Workflow).
Conversion: Use stereosegger convert_saw to prepare this for the pipeline.

2. Processed Input (Parquet)

The core pipeline expects a directory containing:

transcripts.parquet: Long-form table of gene-location occurrences.
genes.parquet: Mapping of gene_id to gene_name.
boundaries.parquet: Polygons (required for training, optional for prediction).

1. Prepare Data

Path A: For Training (Kidneys)

Training requires ground-truth labels. You must provide a label TIFF (e.g., ssdna_mask).

# 1. Convert with labels
stereosegger convert_saw \
  --h5ad kidney_sample.h5ad \
  --labels_tif ssdna_mask.tif \
  --out_dir ./raw_data_labeled

# 2. Build Dataset
stereosegger create_dataset \
  --base_dir ./raw_data_labeled \
  --data_dir ./dataset_labeled

Path B: For Prediction (Whole Chip)

Prediction on new data uses a pre-trained model and does not require a mask.

# 1. Convert without labels
stereosegger convert_saw \
  --h5ad whole_chip.h5ad \
  --out_dir ./raw_data_unlabeled

# 2. Build Dataset
stereosegger create_dataset \
  --base_dir ./raw_data_unlabeled \
  --data_dir ./dataset_unlabeled

2. Train Model (Requires Labeled Data)

Training requires that you provided a --labels_tif during the convert_saw step.

stereosegger train_model \
  --dataset_dir ./processed_dataset \
  --models_dir ./models \
  --sample_tag my_sample \
  --max_epochs 300 \
  --devices 1

3. Run Segmentation (Predict)

stereosegger predict_fast \
  --segger_data_dir ./processed_dataset \
  --models_dir ./models \
  --benchmarks_dir ./results \
  --transcripts_file ./raw_data/transcripts.parquet \
  --model_version 0

Command Reference

1. `convert_saw`

Converts Stereo-seq SAW pipeline output (H5AD) into Parquet format.

Options:

--h5ad PATH: Path to SAW bin1 h5ad file.
--out_dir PATH: Output directory.
--labels_tif PATH: (Optional) Label TIFF for boundary polygons (Required if you intend to train).
--bin_pitch FLOAT: Bin pitch for rounding. Default: 1.0.

2. `create_dataset`

Creates the graph-based dataset used for training and inference.

Options:

--base_dir PATH: Directory containing raw parquet files.
--data_dir PATH: Directory to save the processed dataset.
--tx_graph_mode [kdtree|grid_bins]: Transcript edge strategy. Default: "grid_bins".
--grid_connectivity INT: Grid connectivity (4 or 8). Default: 8.
--within_bin_edges [none|star]: Within-bin edge strategy. Default: "star".

3. `train_model`

Trains the Segger model. Will stop if the dataset is unlabeled.

Options:

--dataset_dir PATH: Processed dataset directory.
--models_dir PATH: Directory to save the model.
--sample_tag TEXT: Unique tag for the sample.
--max_epochs INT: Number of training epochs. Default: 300.

4. `predict_fast`

Runs fast segmentation inference for large grid-based datasets.

Options:

--segger_data_dir PATH: Processed dataset directory.
--models_dir PATH: Trained models directory.
--benchmarks_dir PATH: Output results directory.
--transcripts_file PATH: Original transcripts parquet file.
--model_version INT: Version of the model to load. Default: 0.

Technical Details

Architecture

StereoSegger employs a Heterogeneous Graph Attention Network (GATv2) to segment transcripts based on their spatial neighborhood and identity.

Transcript Nodes (tx): Represents a specific gene at a spatial location.
Boundary Nodes (bd): Represents polygon boundaries (e.g., nuclei).
Supervision: During training, the model learns to predict "belongs" edges between transcripts and ground-truth boundaries.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.3

Jan 29, 2026

This version

0.2.2

Jan 29, 2026

0.2.1

Jan 29, 2026

0.2.0

Jan 29, 2026

0.1.3

Jan 29, 2026

0.1.1

Jan 28, 2026

0.1.0

Jan 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stereosegger-0.2.2.tar.gz (79.4 kB view details)

Uploaded Jan 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

stereosegger-0.2.2-py3-none-any.whl (88.6 kB view details)

Uploaded Jan 29, 2026 Python 3

File details

Details for the file stereosegger-0.2.2.tar.gz.

File metadata

Download URL: stereosegger-0.2.2.tar.gz
Upload date: Jan 29, 2026
Size: 79.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stereosegger-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`92ebb57f4b6b16a280b158f032f054c9c960c177ea2720d2d523799129bfc3cb`
MD5	`a66aa3908a0fc887405e10aba99b2bc4`
BLAKE2b-256	`d7a7eebdac068b7f57ed6232e44f243ec41419b88be941cfd20dcd3abf59b9aa`

See more details on using hashes here.

File details

Details for the file stereosegger-0.2.2-py3-none-any.whl.

File metadata

Download URL: stereosegger-0.2.2-py3-none-any.whl
Upload date: Jan 29, 2026
Size: 88.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for stereosegger-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`45c4fe38430b2e713603a2d90b0ecc64e9fc5728fe929b7318968a77114712d1`
MD5	`8e8cdac205c0ae5f19b18919c89d1cc6`
BLAKE2b-256	`865f1ac9bbf6131eea0fe49a0cf3d644def8bd2005ef59312b7c49b2a01a3e77`

See more details on using hashes here.

stereosegger 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

StereoSegger: Fast and Accurate Cell Segmentation for Stereo-seq

Installation

Quick Install (One-Liner)

Option 1: Automated Setup (Recommended for HPC/Conda)

Inputs & Outputs

1. Raw Input (SAW Output)

2. Processed Input (Parquet)

1. Prepare Data

Path A: For Training (Kidneys)

Path B: For Prediction (Whole Chip)

2. Train Model (Requires Labeled Data)

3. Run Segmentation (Predict)

Command Reference

1. convert_saw

2. create_dataset

3. train_model

4. predict_fast

Technical Details

Architecture

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `convert_saw`

2. `create_dataset`

3. `train_model`

4. `predict_fast`