code for scConcept
Project description
scConcept
This repository contains the python package to train and use scConcept (Single-cell contrastive cell pre-training) method for single-cell transcriptomics.
Installation
You need to have Python 3.12 or newer installed on your system. If you don't have Python installed, we recommend installing uv.
Default installation
Install the latest release of sc-concept from PyPI:
pip install sc-concept
Latest development version
To install the latest development version directly from GitHub:
pip install git+https://github.com/theislab/scConcept.git@main
Training from scratch with Flash Attention
The standard installation is enough for loading pretrained models, extracting embeddings, and light adaptation. If you want to train scConcept from scratch with Flash Attention, use one of the following options.
-
Recommended:
cdto the project root and run./scripts/setup_env.sh, which installs uv if needed and creates a virtual environment with the training dependencies. -
Manual: make sure a CUDA-enabled version of PyTorch is installed. More information is available in the PyTorch installation guide. Then install Flash Attention:
MAX_JOBS=4 pip install "flash-attn>=2.7" --no-build-isolation
This can take up to an hour depending on the system specifications and whether a pre-built release of flash-attn is available for your exact versions of Python, PyTorch, and CUDA. If this takes long, we recommend using the setup script instead.
How to use
scConcept provides a simple API to load and adapt pre-trained models and extract embeddings from scRNA-seq data. Here's a basic example:
from concept import scConcept
import scanpy as sc
# Load your single-cell data
adata = sc.read_h5ad("your_data.h5ad")
# Initialize scConcept and load a pretrained model
concept = scConcept(cache_dir='./cache/')
# Option 1: Load a model directly from HuggingFace
concept.load_config_and_model(model_name='corpus40M-model30M')
# Option 2: Load any local model
concept.load_config_and_model(
config='<path-to-config.yaml>',
model_path='<path-to-model.ckpt>',
gene_mappings_path='<path-to-gene-mapping.pkl>',
)
# Extract embeddings --> adata.var['gene_id']: ENSGXXXXXXXXXXX
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
# Use embeddings for downstream analysis
adata.obsm['X_scConcept'] = result['cls_cell_emb']
# Adapt a pre-trained model on your own data
concept.train(adata, max_steps=10000, batch_size=128)
# Important: For multiple datasets pass them separately
concept.train([adata1, adata2, ...], max_steps=20000, batch_size=128)
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
adata.obsm['X_scConcept_adapted'] = result['cls_cell_emb']
Large-scale pre-training from scratch
scConcept.train() is only for light adaptation of pretrained models or small trainings on the fly. Use train.py for distributed model pre-training from scratch over large corpus of data.
Before using train.py follow the instructions on lamindb for setting up a lamin instance.
Troubleshooting
If you encounter an error when loading a pre-trained model, try the following:
- Remove the repository and clone the most recent version
- Remove the cache directory (
cache/by default) - Run again
This will force a fresh download of the pre-trained model and should resolve most loading issues.
Citation
Bahrami, M., Tejada-Lapuerta, A., Becker, S., Hashemi G, F.S. and Theis, F.J., 2025. scConcept: Contrastive pretraining for technology-agnostic single-cell representations beyond reconstruction. bioRxiv, pp.2025-10. doi: https://doi.org/10.1101/2025.10.14.682419
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sc_concept-0.2.0.tar.gz.
File metadata
- Download URL: sc_concept-0.2.0.tar.gz
- Upload date:
- Size: 512.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99f22133c9eb147796caf7880cb055a42375cd75de2946f58ab8a5fab84754f1
|
|
| MD5 |
7f5d981242780cd37dc07dd43e9ec095
|
|
| BLAKE2b-256 |
5e757c3b73d525ab6914abc9b6d7cf0aa65766dfc27631fa5177f3a9d77a5eb8
|
Provenance
The following attestation bundles were made for sc_concept-0.2.0.tar.gz:
Publisher:
release.yaml on theislab/scConcept
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sc_concept-0.2.0.tar.gz -
Subject digest:
99f22133c9eb147796caf7880cb055a42375cd75de2946f58ab8a5fab84754f1 - Sigstore transparency entry: 1394305725
- Sigstore integration time:
-
Permalink:
theislab/scConcept@56c068bdc1a51945b1db6135cd1547e3295da14f -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/theislab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@56c068bdc1a51945b1db6135cd1547e3295da14f -
Trigger Event:
release
-
Statement type:
File details
Details for the file sc_concept-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sc_concept-0.2.0-py3-none-any.whl
- Upload date:
- Size: 167.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f711112a5bfc758a35b585102eed00883efc877e4c179c85bc283190060ab818
|
|
| MD5 |
5f7c4c62273433606c40af7bcf6c881e
|
|
| BLAKE2b-256 |
481210d60bbdf0327622386a68907fb7bb7e53982ae06cda7a1309617822632a
|
Provenance
The following attestation bundles were made for sc_concept-0.2.0-py3-none-any.whl:
Publisher:
release.yaml on theislab/scConcept
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sc_concept-0.2.0-py3-none-any.whl -
Subject digest:
f711112a5bfc758a35b585102eed00883efc877e4c179c85bc283190060ab818 - Sigstore transparency entry: 1394305729
- Sigstore integration time:
-
Permalink:
theislab/scConcept@56c068bdc1a51945b1db6135cd1547e3295da14f -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/theislab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@56c068bdc1a51945b1db6135cd1547e3295da14f -
Trigger Event:
release
-
Statement type: