Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data

These details have not been verified by PyPI

Project links

Project description

DecontX Python

A Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.

Overview

DecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.

Key Features:

🐍 Pure Python implementation (no R required)
🔬 Seamless scanpy integration
⚡ Numba-accelerated performance
📊 Bayesian contamination estimation per cell
🎯 Validated against original R implementation

Installation

pip install decontx-python

Quick Start

import scanpy as sc
import decontx

# Load and preprocess data with scanpy
adata = sc.read_h5ad("pbmc.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata)

# Remove ambient RNA contamination
decontx.decontx(adata, cluster_key="leiden")

# Access results
contamination = adata.obs['decontX_contamination']
clean_counts = adata.layers['decontX_counts']

print(f"Mean contamination: {contamination.mean():.1%}")
print(f"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}")

Why DecontX?

Ambient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:

Cross-contamination between cell types
Blurred cell type boundaries
False positive marker gene expression
Reduced clustering quality

DecontX models each cell as a mixture of:

Native transcripts from the cell's true type
Contaminating transcripts from other cell types in the sample

Method Comparison

Based on our benchmarking study:

Method	Ambient RNA Removed	Precision	Conservativeness
SoupX	~65%	High	Very conservative
DecontX	~90%	Medium-High	Balanced
CellBender	~90%	Medium	More aggressive

Recommendation:

Use SoupX for maximum safety and minimal false positives
Use DecontX for balanced contamination removal in standard workflows
Use CellBender when you can replace your entire preprocessing pipeline

API Reference

Main Function

decontx.decontx(
    adata,
    cluster_key="leiden",
    max_iter=500,
    delta=(10.0, 10.0),
    estimate_delta=True,
    convergence=0.001,
    copy=False
)

Parameters:

adata: AnnData object with raw counts in .X
cluster_key: Column in .obs containing cluster labels
max_iter: Maximum EM iterations (default: 500)
delta: Beta prior parameters for contamination (default: (10,10))
estimate_delta: Whether to estimate delta parameters (default: True)
convergence: Convergence threshold (default: 0.001)
copy: Return copy or modify in place (default: False)

Returns: Results stored in adata:

adata.obs['decontX_contamination']: Per-cell contamination estimates
adata.layers['decontX_counts']: Decontaminated count matrix
adata.uns['decontX']: Model parameters and metadata

Utility Functions

# Get decontaminated counts as array
clean_counts = decontx.get_decontx_counts(adata)

# Get contamination estimates
contamination = decontx.get_decontx_contamination(adata)

# Simple simulation for testing
sim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)

Performance Notes

Python implementation is ~5-6x slower than R version
Performance acceptable for typical datasets (<50k cells)
Numba JIT compilation provides significant speedup after first run
Memory usage scales linearly with dataset size

Integration with Existing Workflows

DecontX fits naturally into scanpy workflows:

# Standard scanpy analysis
sc.tl.leiden(adata, resolution=0.5)
sc.tl.rank_genes_groups(adata, 'leiden')

# Add decontamination
decontx.decontx(adata, cluster_key='leiden')

# Continue with decontaminated data
adata.X = adata.layers['decontX_counts']
sc.pp.log1p(adata)  # Re-log transform clean counts
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, n_pcs=50)

Citation

If you use DecontX in your research, please cite:

Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6

Issues and Support

🐛 Report bugs: GitHub Issues
📖 Documentation: Read the Docs
💬 Questions: GitHub Discussions

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Sep 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decontx_python-0.2.0.tar.gz (18.7 kB view details)

Uploaded Sep 8, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

decontx_python-0.2.0-py3-none-any.whl (15.2 kB view details)

Uploaded Sep 8, 2025 Python 3

File details

Details for the file decontx_python-0.2.0.tar.gz.

File metadata

Download URL: decontx_python-0.2.0.tar.gz
Upload date: Sep 8, 2025
Size: 18.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for decontx_python-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`74b1a049ae9633fa7052b291069caa06d71a80ac98636abc6d92d207f6dd8a10`
MD5	`6d9ec1ec7ff75fe0405d2ba28fbc4a74`
BLAKE2b-256	`fbe942942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315`

See more details on using hashes here.

File details

Details for the file decontx_python-0.2.0-py3-none-any.whl.

File metadata

Download URL: decontx_python-0.2.0-py3-none-any.whl
Upload date: Sep 8, 2025
Size: 15.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.11

File hashes

Hashes for decontx_python-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6675845cc3ad856c4c878640260529c101ad0964decbc62343c991a8926616d2`
MD5	`abef0e7bb7205473b5b55ac2f5cdec1b`
BLAKE2b-256	`30ecb6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da`

See more details on using hashes here.

decontx-python 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DecontX Python

Overview

Installation

Quick Start

Why DecontX?

Method Comparison

API Reference

Main Function

Utility Functions

Performance Notes

Integration with Existing Workflows

Citation

Issues and Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes