Gene Expression Decomposition for Integration - scverse-compliant single-cell RNA-seq batch correction
Project description
gedi2py
Gene Expression Decomposition for Integration
A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.
Overview
gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.
Installation
pip (recommended)
pip install gedi2py
From source
git clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .
Requirements
- Python >= 3.10
- C++14 compiler
- Eigen3 >= 3.3.0
- CMake >= 3.15
See the Installation Guide for detailed instructions.
Quick Start
import gedi2py as gd
import scanpy as sc
# Load data
adata = sc.read_h5ad("data.h5ad")
# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)
# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])
Features
- Memory-efficient: C++ backend keeps large matrices in native memory
- Fast: OpenMP parallelization for multi-threaded optimization
- scverse-compliant: Works seamlessly with AnnData and scanpy
- Flexible: Supports counts, log-transformed data, paired data (e.g., CITE-seq), and binary indicators
- Comprehensive: Includes projections, embeddings, imputation, and differential analysis
Paired Data Mode (M_paired)
gedi2py supports paired count data stored in two AnnData layers, useful for:
- CITE-seq (ADT vs RNA)
- Dual-modality assays
- Ratio-based analyses
# Two layers: 'm1' (numerator counts) and 'm2' (denominator counts)
# GEDI models: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
adata,
batch_key="sample",
layer="m1", # First count matrix
layer2="m2", # Second count matrix
n_latent=10
)
Documentation
Full documentation is available at csglab.github.io/gedi2py:
API Overview
gedi2py follows the scanpy convention with submodules:
| Module | Description |
|---|---|
gd.tl |
Tools: model training, projections, embeddings, imputation, differential |
gd.pl |
Plotting: embeddings, convergence, features |
gd.io |
I/O: H5AD, 10X formats, model persistence |
import gedi2py as gd
# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)
# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)
# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")
Citation
If you use gedi2py in your research, please cite:
Mikaeili Namini, A., & Najafabadi, H.S. (2024). GEDI: Gene Expression Decomposition for Integration of single-cell RNA-seq data.
License
MIT License - see LICENSE for details.
Links
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gedi2py-0.2.0.tar.gz.
File metadata
- Download URL: gedi2py-0.2.0.tar.gz
- Upload date:
- Size: 106.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3594a1e4f88596a843436f9b9d418ec3c5d43793ba2d644b193ef9985e49e0ef
|
|
| MD5 |
84253950f6040dfa50e7dec40a0d99d9
|
|
| BLAKE2b-256 |
248ef78aae3637ab15c3761696cce6438f4222f664fc8e7664fe89c5cf4345a5
|
Provenance
The following attestation bundles were made for gedi2py-0.2.0.tar.gz:
Publisher:
publish.yml on csglab/gedi2py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gedi2py-0.2.0.tar.gz -
Subject digest:
3594a1e4f88596a843436f9b9d418ec3c5d43793ba2d644b193ef9985e49e0ef - Sigstore transparency entry: 848758103
- Sigstore integration time:
-
Permalink:
csglab/gedi2py@830c7b25916d7b3393abf964a4043fa51b29f6e3 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/csglab
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@830c7b25916d7b3393abf964a4043fa51b29f6e3 -
Trigger Event:
release
-
Statement type: