Skip to main content

code for scConcept

Project description

scConcept

Tests Documentation

This repository contains the python package to train and use scConcept (Single-cell contrastive cell pre-training) method for single-cell transcriptomics.

Installation

You need to have Python 3.12 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

Default installation

Install the latest release of sc-concept from PyPI:

pip install sc-concept

Latest development version

To install the latest development version directly from GitHub:

pip install git+https://github.com/theislab/scConcept.git@main

Training from scratch with Flash Attention

The standard installation is enough for loading pretrained models, extracting embeddings, and light adaptation. If you want to train scConcept from scratch with Flash Attention, use one of the following options.

  1. Recommended: cd to the project root and run ./scripts/setup_env.sh, which installs uv if needed and creates a virtual environment with the training dependencies.

  2. Manual: make sure a CUDA-enabled version of PyTorch is installed. More information is available in the PyTorch installation guide. Then install Flash Attention:

MAX_JOBS=4 pip install "flash-attn>=2.7" --no-build-isolation

This can take up to an hour depending on the system specifications and whether a pre-built release of flash-attn is available for your exact versions of Python, PyTorch, and CUDA. If this takes long, we recommend using the setup script instead.

How to use

scConcept provides a simple API to load and adapt pre-trained models and extract embeddings from scRNA-seq data. Here's a basic example:

from concept import scConcept
import scanpy as sc

# Load your single-cell data
adata = sc.read_h5ad("your_data.h5ad")

# Initialize scConcept and load a pretrained model
concept = scConcept(cache_dir='./cache/')

# Option 1: Load a model directly from HuggingFace
concept.load_config_and_model(model_name='corpus40M-model30M') 

# Option 2: Load any local model
concept.load_config_and_model(
    config='<path-to-config.yaml>',
    model_path='<path-to-model.ckpt>',
    gene_mappings_path='<path-to-gene-mappings-directory>',
)

# scConcept accepts Gene Ensemble IDs as input. You can use built-in helper methods to do the mapping if needed:
adata.var['gene_id'] = concept.map_gene_names_to_ids(
    species='hsapiens', # see concept.species for available species names
    gene_names=adata.var_names.tolist(),
)

# Extract embeddings --> adata.var['gene_id']: ENSGXXXXXXXXXXX
result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')

# Use embeddings for downstream analysis
adata.obsm['X_scConcept'] = result['cls_cell_emb']

Model adaptation

# Adapt a pre-trained model on your own data
concept.train(adata, max_steps=10000, batch_size=128) 

# Important: For multiple datasets pass them separately
concept.train([adata1, adata2, ...], max_steps=20000, batch_size=128) 

result = concept.extract_embeddings(adata=adata, gene_id_column='gene_id')
adata.obsm['X_scConcept_adapted'] = result['cls_cell_emb']

Large-scale pre-training from scratch

scConcept.train() is only for light adaptation of pretrained models or small trainings on the fly. Use train.py for distributed model pre-training from scratch over large corpus of data.

Before using train.py follow the instructions on lamindb for setting up a lamin instance.

Troubleshooting

If you encounter an error when loading a pre-trained model, try the following:

  1. Remove the repository and clone the most recent version
  2. Remove the cache directory (cache/ by default)
  3. Run again

This will force a fresh download of the pre-trained model and should resolve most loading issues.

Citation

Bahrami, M., Tejada-Lapuerta, A., Becker, S., Hashemi G, F.S. and Theis, F.J., 2025. scConcept: Contrastive pretraining for technology-agnostic single-cell representations beyond reconstruction. bioRxiv, pp.2025-10. doi: https://doi.org/10.1101/2025.10.14.682419

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sc_concept-0.2.3.tar.gz (776.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sc_concept-0.2.3-py3-none-any.whl (168.7 kB view details)

Uploaded Python 3

File details

Details for the file sc_concept-0.2.3.tar.gz.

File metadata

  • Download URL: sc_concept-0.2.3.tar.gz
  • Upload date:
  • Size: 776.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sc_concept-0.2.3.tar.gz
Algorithm Hash digest
SHA256 1967762456cdaf6d655aa93fa625571e73390a097b100fb8d92ae2b1a55878de
MD5 e4c15773fc5cd940ce91225be36c0732
BLAKE2b-256 4dcd7cfff5d612a49ea6d60fb4a08675b49a9d7324a69e11c68bb0b544c97c60

See more details on using hashes here.

Provenance

The following attestation bundles were made for sc_concept-0.2.3.tar.gz:

Publisher: release.yaml on theislab/scConcept

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sc_concept-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: sc_concept-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 168.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sc_concept-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 27a77b107df020a4b982a53ce8f4e609fad14455539c6a9367a512cb63234359
MD5 b304c5df6f6c614dbccfe5c0c39e0517
BLAKE2b-256 d4eea5326decde9210f74c421fbc0513c2c6be9761a44b275e8d0ef4cc33350c

See more details on using hashes here.

Provenance

The following attestation bundles were made for sc_concept-0.2.3-py3-none-any.whl:

Publisher: release.yaml on theislab/scConcept

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page