Stack is a single-cell foundation model that enables in-context learning at inference time.
Project description
Stack: In-context learning of single-cell biology
Stack is a large-scale encoder-decoder foundation model trained on 150 million uniformly-preprocessed single cells. It introduces a novel tabular attention architecture that enables both intra- and inter-cellular information flow, setting cell-by-gene matrix chunks as the basic input data unit. Through in-context learning, Stack offers substantial performance improvements in generalizing biological effects and enables generation of unseen cell profiles in novel contexts.
Installation
Using pip
# Install from PyPI
pip install arc-stack
# Or install from source for development
git clone https://github.com/ArcInstitute/stack.git
cd stack
pip install -e .
Using uv
# Install from PyPI
uv pip install arc-stack
# Or install from source for development
git clone https://github.com/ArcInstitute/stack.git
cd stack
uv pip install -e .
Quick Start
- Use Stack to embed your single-cell data: Notebook
- Use Stack to zero-shot predict unseen perturbation/observation profiles: Notebook
Training Stack from Scratch
# Once installed, the console entry point becomes available
stack-train \
--dataset_configs "/path/to/data:false:gene_symbols" \
--genelist_path "hvg_genes.pkl" \
--save_dir "./checkpoints" \
--sample_size 256 \
--batch_size 32 \
--n_hidden 100 \
--token_dim 16 \
--n_layers 9 \
--max_epochs 10
# Alternatively, invoke the module directly when working from a cloned repo
python -m stack.cli.launch_training [args...]
Fine-tuning Stack with Frozen Teacher
stack-finetune \
--checkpoint_path "./checkpoints/pretrained.ckpt" \
--dataset_configs "human:/path/to/data:donor_id:cell_type:false" \
--genelist_path "hvg_genes.pkl" \
--save_dir "./finetuned_checkpoints" \
--sample_size 512 \
--batch_size 8 \
--replacement_ratio 0.75 \
--max_epochs 8
# Or use uv run
uv run stack-finetune [args...]
# Repository wrapper remains available for local development
python -m stack.cli.launch_finetuning [args...]
Running Stack with configuration files
Both launch_training.py and launch_finetuning.py accept a --config flag that points to a YAML or JSON file. Any command line
arguments omitted after --config inherit their values from the file, while flags provided on the command line override the
configuration. Example configs mirroring the provided Slurm scripts live under configs/:
# Train with the preset configuration
stack-train --config configs/training/bc_large.yaml
# Override a single hyperparameter without editing the file
stack-train --config configs/training/bc_large.yaml --learning_rate 5e-5
# Fine-tune using a config file
stack-finetune --config configs/finetuning/ft_parsecg.yaml
# Direct module invocation is still supported if you prefer python -m
python -m stack.cli.launch_training --config configs/training/bc_large.yaml
Note: YAML configs require
pyyaml. Install it withpip install pyyamlor use a JSON config file.
Extracting Stack Embeddings
stack-embedding \
--checkpoint "./checkpoints/pretrained.ckpt" \
--adata "data.h5ad" \
--genelist "hvg_genes.pkl" \
--output "embeddings.h5ad" \
--batch-size 32
# Or use uv run
uv run stack-embedding \
--checkpoint "./checkpoints/pretrained.ckpt" \
--adata "data.h5ad" \
--genelist "hvg_genes.pkl" \
--output "embeddings.h5ad" \
--batch-size 32
In-Context Generation with Stack
stack-generation \
--checkpoint "./checkpoints/pretrained.ckpt" \
--base-adata "base_data.h5ad" \
--test-adata "test_data.h5ad" \
--genelist "hvg_genes.pkl" \
--output-dir "./generations" \
--split-column "donor_id"
# Or use uv run
uv run stack-generation \
--checkpoint "./checkpoints/pretrained.ckpt" \
--base-adata "base_data.h5ad" \
--test-adata "test_data.h5ad" \
--genelist "hvg_genes.pkl" \
--output-dir "./generations" \
--split-column "donor_id"
Model Architecture
- Tabular Attention: Alternating cell-wise and gene-wise attention layers
- Token Dimension: Configurable token embedding dimension (default: 16)
- Hidden Dimension: Gene dimension reduction (default: 100)
- Masking Strategy: Rectangular masking with variable rates (0.1-0.8)
Data Preparation
Computing Highly Variable Genes (HVGs)
from stack.data.datasets import DatasetConfig, compute_hvg_union
configs = [DatasetConfig(path="/data/path", filter_organism=True)]
hvg_genes = compute_hvg_union(configs, n_top_genes=1000, output_path="hvg.pkl")
Dataset Configuration Format
- Human datasets:
human:/path:donor_col:cell_type_col[:filter_organism[:gene_col]] - Drug datasets:
drug:/path:condition_col:cell_line_col:control_condition[:filter_organism[:gene_col]]
Key Features
- In-Context Learning: Zero-shot generalization to new biological contexts
- Multi-Dataset Training: Simultaneous training on multiple single-cell datasets
- Frozen Teacher Fine-tuning: Novel fine-tuning procedure with stable teacher targets
- Efficient Data Loading: Optimized HDF5 loading with sparse matrix support
Note:
scShiftAttentionModelremains available as an alias for backward compatibility.
Citation
If you use Stack in your research, please cite the Stack paper.
Licenses
Stack code is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 (CC BY-NC-SA 4.0).
The model weights and output are licensed under the Arc Research Institute Stack Model Non-Commercial License and subject to the Arc Research Institute Stack Model Acceptable Use Policy.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arc_stack-0.1.3.tar.gz.
File metadata
- Download URL: arc_stack-0.1.3.tar.gz
- Upload date:
- Size: 103.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bcd79b8a83e0a0663fb7b4553d64fed887810790019605b38c37d68bce06337
|
|
| MD5 |
e5606e4323dc42f6108c4bfa348ec546
|
|
| BLAKE2b-256 |
a253d4ab73ce42cb20385d9c38db8f04cfbd6071f295ad4f871896bbc312996c
|
Provenance
The following attestation bundles were made for arc_stack-0.1.3.tar.gz:
Publisher:
python-publish.yml on ArcInstitute/stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arc_stack-0.1.3.tar.gz -
Subject digest:
1bcd79b8a83e0a0663fb7b4553d64fed887810790019605b38c37d68bce06337 - Sigstore transparency entry: 1108796901
- Sigstore integration time:
-
Permalink:
ArcInstitute/stack@91a858f0c67a87ba9e418fc0ff2800a2586882c6 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ArcInstitute
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@91a858f0c67a87ba9e418fc0ff2800a2586882c6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file arc_stack-0.1.3-py3-none-any.whl.
File metadata
- Download URL: arc_stack-0.1.3-py3-none-any.whl
- Upload date:
- Size: 111.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce720b640f5b42d9665f9a263cf52aab739ddd5728988f6f0256e759f09268d3
|
|
| MD5 |
03bc657b70e9b418738e6be9fade52e0
|
|
| BLAKE2b-256 |
f9d91138ecda2f46e8f12cf1dce8b02780ee78e399dd22f768e2aaef6d30046a
|
Provenance
The following attestation bundles were made for arc_stack-0.1.3-py3-none-any.whl:
Publisher:
python-publish.yml on ArcInstitute/stack
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arc_stack-0.1.3-py3-none-any.whl -
Subject digest:
ce720b640f5b42d9665f9a263cf52aab739ddd5728988f6f0256e759f09268d3 - Sigstore transparency entry: 1108796917
- Sigstore integration time:
-
Permalink:
ArcInstitute/stack@91a858f0c67a87ba9e418fc0ff2800a2586882c6 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ArcInstitute
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@91a858f0c67a87ba9e418fc0ff2800a2586882c6 -
Trigger Event:
release
-
Statement type: