Skip to main content

Gene Expression Decomposition for Integration - scverse-compliant single-cell RNA-seq batch correction

Project description

gedi2py

Documentation PyPI version Python License: MIT

Gene Expression Decomposition for Integration

A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.

Overview

gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.

Installation

pip (recommended)

pip install gedi2py

From source

git clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .

Requirements

  • Python >= 3.10
  • C++14 compiler
  • Eigen3 >= 3.3.0
  • CMake >= 3.15

See the Installation Guide for detailed instructions.

Quick Start

import gedi2py as gd
import scanpy as sc

# Load data
adata = sc.read_h5ad("data.h5ad")

# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])

Features

  • Memory-efficient: C++ backend keeps large matrices in native memory
  • Fast: OpenMP parallelization for multi-threaded optimization
  • scverse-compliant: Works seamlessly with AnnData and scanpy
  • Flexible: Supports counts, log-transformed data, paired data (e.g., CITE-seq), and binary indicators
  • Comprehensive: Includes projections, embeddings, imputation, and differential analysis

Paired Data Mode (M_paired)

gedi2py supports paired count data stored in two AnnData layers, useful for:

  • CITE-seq (ADT vs RNA)
  • Dual-modality assays
  • Ratio-based analyses
# Two layers: 'm1' (numerator counts) and 'm2' (denominator counts)
# GEDI models: Yi = log((M1+1)/(M2+1))
gd.tl.gedi(
    adata,
    batch_key="sample",
    layer="m1",      # First count matrix
    layer2="m2",     # Second count matrix
    n_latent=10
)

Documentation

Full documentation is available at csglab.github.io/gedi2py:

API Overview

gedi2py follows the scanpy convention with submodules:

Module Description
gd.tl Tools: model training, projections, embeddings, imputation, differential
gd.pl Plotting: embeddings, convergence, features
gd.io I/O: H5AD, 10X formats, model persistence
import gedi2py as gd

# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)

# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)

# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")

Citation

If you use gedi2py in your research, please cite:

Mikaeili Namini, A., & Najafabadi, H.S. (2024). GEDI: Gene Expression Decomposition for Integration of single-cell RNA-seq data.

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gedi2py-0.2.0.tar.gz (106.4 kB view details)

Uploaded Source

File details

Details for the file gedi2py-0.2.0.tar.gz.

File metadata

  • Download URL: gedi2py-0.2.0.tar.gz
  • Upload date:
  • Size: 106.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gedi2py-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3594a1e4f88596a843436f9b9d418ec3c5d43793ba2d644b193ef9985e49e0ef
MD5 84253950f6040dfa50e7dec40a0d99d9
BLAKE2b-256 248ef78aae3637ab15c3761696cce6438f4222f664fc8e7664fe89c5cf4345a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedi2py-0.2.0.tar.gz:

Publisher: publish.yml on csglab/gedi2py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page