Skip to main content

Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data

Project description

DecontX Python

A Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.

Overview

DecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.

Key Features:

  • 🐍 Pure Python implementation (no R required)
  • 🔬 Seamless scanpy integration
  • ⚡ Numba-accelerated performance
  • 📊 Bayesian contamination estimation per cell
  • 🎯 Validated against original R implementation

Installation

pip install decontx-python

Quick Start

import scanpy as sc
import decontx

# Load and preprocess data with scanpy
adata = sc.read_h5ad("pbmc.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata)

# Remove ambient RNA contamination
decontx.decontx(adata, cluster_key="leiden")

# Access results
contamination = adata.obs['decontX_contamination']
clean_counts = adata.layers['decontX_counts']

print(f"Mean contamination: {contamination.mean():.1%}")
print(f"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}")

Why DecontX?

Ambient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:

  • Cross-contamination between cell types
  • Blurred cell type boundaries
  • False positive marker gene expression
  • Reduced clustering quality

DecontX models each cell as a mixture of:

  1. Native transcripts from the cell's true type
  2. Contaminating transcripts from other cell types in the sample

Method Comparison

Based on our benchmarking study:

Method Ambient RNA Removed Precision Conservativeness
SoupX ~65% High Very conservative
DecontX ~90% Medium-High Balanced
CellBender ~90% Medium More aggressive

Recommendation:

  • Use SoupX for maximum safety and minimal false positives
  • Use DecontX for balanced contamination removal in standard workflows
  • Use CellBender when you can replace your entire preprocessing pipeline

API Reference

Main Function

decontx.decontx(
    adata,
    cluster_key="leiden",
    max_iter=500,
    delta=(10.0, 10.0),
    estimate_delta=True,
    convergence=0.001,
    copy=False
)

Parameters:

  • adata: AnnData object with raw counts in .X
  • cluster_key: Column in .obs containing cluster labels
  • max_iter: Maximum EM iterations (default: 500)
  • delta: Beta prior parameters for contamination (default: (10,10))
  • estimate_delta: Whether to estimate delta parameters (default: True)
  • convergence: Convergence threshold (default: 0.001)
  • copy: Return copy or modify in place (default: False)

Returns: Results stored in adata:

  • adata.obs['decontX_contamination']: Per-cell contamination estimates
  • adata.layers['decontX_counts']: Decontaminated count matrix
  • adata.uns['decontX']: Model parameters and metadata

Utility Functions

# Get decontaminated counts as array
clean_counts = decontx.get_decontx_counts(adata)

# Get contamination estimates
contamination = decontx.get_decontx_contamination(adata)

# Simple simulation for testing
sim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)

Performance Notes

  • Python implementation is ~5-6x slower than R version
  • Performance acceptable for typical datasets (<50k cells)
  • Numba JIT compilation provides significant speedup after first run
  • Memory usage scales linearly with dataset size

Integration with Existing Workflows

DecontX fits naturally into scanpy workflows:

# Standard scanpy analysis
sc.tl.leiden(adata, resolution=0.5)
sc.tl.rank_genes_groups(adata, 'leiden')

# Add decontamination
decontx.decontx(adata, cluster_key='leiden')

# Continue with decontaminated data
adata.X = adata.layers['decontX_counts']
sc.pp.log1p(adata)  # Re-log transform clean counts
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, n_pcs=50)

Citation

If you use DecontX in your research, please cite:

Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6

Issues and Support

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decontx_python-0.2.0.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

decontx_python-0.2.0-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file decontx_python-0.2.0.tar.gz.

File metadata

  • Download URL: decontx_python-0.2.0.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for decontx_python-0.2.0.tar.gz
Algorithm Hash digest
SHA256 74b1a049ae9633fa7052b291069caa06d71a80ac98636abc6d92d207f6dd8a10
MD5 6d9ec1ec7ff75fe0405d2ba28fbc4a74
BLAKE2b-256 fbe942942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315

See more details on using hashes here.

File details

Details for the file decontx_python-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: decontx_python-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for decontx_python-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6675845cc3ad856c4c878640260529c101ad0964decbc62343c991a8926616d2
MD5 abef0e7bb7205473b5b55ac2f5cdec1b
BLAKE2b-256 30ecb6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page