Skip to main content

Gene Expression Decomposition for Integration - scverse-compliant single-cell RNA-seq batch correction

Project description

gedi2py

Documentation PyPI version Python License: MIT

Gene Expression Decomposition for Integration

A scverse-compliant Python package for single-cell RNA-seq batch correction and dimensionality reduction using the GEDI algorithm.

Overview

gedi2py implements a latent variable model for integrating single-cell RNA sequencing data across multiple samples and batches. It learns shared gene expression patterns while correcting for technical batch effects, producing batch-corrected cell embeddings suitable for downstream analysis.

Installation

pip (recommended)

pip install gedi2py

From source

git clone https://github.com/csglab/gedi2py.git
cd gedi2py
pip install -e .

Requirements

  • Python >= 3.10
  • C++14 compiler
  • Eigen3 >= 3.3.0
  • CMake >= 3.15

See the Installation Guide for detailed instructions.

Quick Start

import gedi2py as gd
import scanpy as sc

# Load data
adata = sc.read_h5ad("data.h5ad")

# Preprocess
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)

# Run GEDI batch correction
gd.tl.gedi(adata, batch_key="sample", n_latent=10)

# Visualize
gd.tl.umap(adata)
gd.pl.embedding(adata, color=["sample", "cell_type"])

Features

  • Memory-efficient: C++ backend keeps large matrices in native memory
  • Fast: OpenMP parallelization for multi-threaded optimization
  • scverse-compliant: Works seamlessly with AnnData and scanpy
  • Flexible: Supports counts, log-transformed data, and binary indicators
  • Comprehensive: Includes projections, embeddings, imputation, and differential analysis

Documentation

Full documentation is available at csglab.github.io/gedi2py:

API Overview

gedi2py follows the scanpy convention with submodules:

Module Description
gd.tl Tools: model training, projections, embeddings, imputation, differential
gd.pl Plotting: embeddings, convergence, features
gd.io I/O: H5AD, 10X formats, model persistence
import gedi2py as gd

# Tools
gd.tl.gedi(adata, batch_key="sample")
gd.tl.umap(adata)

# Plotting
gd.pl.embedding(adata, color="cell_type")
gd.pl.convergence(adata)

# I/O
adata = gd.read_h5ad("data.h5ad")
gd.io.save_model(adata, "model.h5")

Citation

If you use gedi2py in your research, please cite:

Mikaeili Namini, A., & Najafabadi, H.S. (2024). GEDI: Gene Expression Decomposition for Integration of single-cell RNA-seq data.

License

MIT License - see LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gedi2py-0.1.0.tar.gz (104.6 kB view details)

Uploaded Source

File details

Details for the file gedi2py-0.1.0.tar.gz.

File metadata

  • Download URL: gedi2py-0.1.0.tar.gz
  • Upload date:
  • Size: 104.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gedi2py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22f590268d41aa450c8e3014d458c6aeb6f604388a0a8e29530b6da9caa1f574
MD5 5e209fbd90ff23fd33909a07d437af6b
BLAKE2b-256 3c1cf2f68b53c882b7eb37e312953a96b60c230a7913502d4431a5fd54d2f0ea

See more details on using hashes here.

Provenance

The following attestation bundles were made for gedi2py-0.1.0.tar.gz:

Publisher: publish.yml on csglab/gedi2py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page