Skip to main content

Run embedding comparisons for single-cell data

Project description

Comparing embeddings for single-cell and spatial data

Tests Documentation

Single-cell RNA-sequencing (scRNA-seq) measures gene expression in individual cells and generates large datasets. Typically, these datasets consist of several samples, each corresponding to a combination of covariates (e.g. patient, time point, disease status, technology, etc.). Analyzing these vast datasets (often containing millions of cells for thousands of genes) is facilitated by data integration approaches, which learn lower-dimensional representations that remove the effects of certain unwanted covariates (such as experimental batch, the chip the data was run on, etc).

Here, we use slurm_sweep to efficiently parallelize and track different data integration approaches, and we compare their performance in terms of scIB metrics. For each data integration method, we compute a shared latent space, quantify integration performance in terms of batch correction and bio conservation, visualize the latent space with UMAP, store the model and embedding coordinates, and store all relevant data on wandb, so that we can retrieve it after the sweep.

scembed consists of shallow wrappers around commonly used integration tools, a class to facilitate scIB comparisons, and another class to retrieve and aggregate sweep results.

Methods included

  • GPU-based methods: scVI, scANVI, scPoli, ResolVI, scVIVA
  • CPU-based methods: Harmony, LIGER, Scanorama, HVG, Pre-computed embeddings

Evaluation

  • scIB metrics: Standardized benchmarking for integration quality
  • UMAP visualization: Visual assessment of integration
  • Artifact tracking: Models and embeddings stored in wandb

Outputs

Per Method

  • Integration embedding: Stored in wandb as table
  • scIB metrics: Comprehensive benchmarking scores
  • UMAP plots: Visualization by cell type and batch
  • Model weights: For deep learning methods

Summary Metrics

  • scib_total_score: Overall integration quality
  • scib_bio_conservation: Preservation of biological signal
  • scib_batch_correction: Removal of batch effects

Getting started

Please refer to the documentation, in particular, the API documentation.

Installation

You need to have Python 3.10 or newer installed on your system. If you don't have Python installed, we recommend installing uv.

There are several alternative options to install scembed:

  1. Install the latest development version:
pip install git+https://github.com/quadbio/scembed.git@main

Note: If you encounter C++ compilation errors (e.g., with louvain or annoy), install those packages via conda first:

mamba install louvain python-annoy

Dependency Groups

The package uses optional dependency groups to minimize installation overhead:

  • Base: Core functionality (scanpy, scib-metrics, wandb)
  • [cpu]: CPU-based methods (e.g. Harmony, LIGER, Scanorama)
  • [gpu]: GPU-based methods (e.g. scVI, scANVI, scPoli)
  • [fast_metrics]: Accelerated evaluation with faiss and RAPIDS
  • [all]: All optional dependencies

Release notes

See the changelog.

Contact

For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.

Citation

t.b.a

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scembed-0.0.1.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scembed-0.0.1-py3-none-any.whl (34.7 kB view details)

Uploaded Python 3

File details

Details for the file scembed-0.0.1.tar.gz.

File metadata

  • Download URL: scembed-0.0.1.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scembed-0.0.1.tar.gz
Algorithm Hash digest
SHA256 53c0fce077fea176988d61cad0dee67867df61c0334cf8c33f84104d41932e0e
MD5 57371aaaddf2524d78fb9a1bd3b119ae
BLAKE2b-256 5fdf5a3fdff6c6db95ef52e5e8bb3ab9cd44a7e834264dfb19899ef2ee375f4d

See more details on using hashes here.

Provenance

The following attestation bundles were made for scembed-0.0.1.tar.gz:

Publisher: release.yaml on quadbio/scembed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scembed-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: scembed-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 34.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for scembed-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1aad9946c487ede2587f16106eb65d32146e1e25b183d1e1fd8afd864025c823
MD5 a0f7560a48496fbc8cb12f04447e3224
BLAKE2b-256 85d3b0583dfa1991a38b350bedd2bf71b6766a13e852474462eb91ceb2fd0cbe

See more details on using hashes here.

Provenance

The following attestation bundles were made for scembed-0.0.1-py3-none-any.whl:

Publisher: release.yaml on quadbio/scembed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page