Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data
Project description
DecontX Python
A Python implementation of DecontX for removing ambient RNA contamination from single-cell RNA-seq data, designed for seamless integration with scanpy workflows.
Overview
DecontX is a Bayesian method to estimate and remove cross-contamination from ambient RNA in droplet-based single-cell RNA-seq data. This Python implementation provides near-perfect parity with the original R version (correlation > 0.999) while enabling pure Python workflows without R dependencies.
Key Features:
- 🐍 Pure Python implementation (no R required)
- 🔬 Seamless scanpy integration
- ⚡ Numba-accelerated performance
- 📊 Bayesian contamination estimation per cell
- 🎯 Validated against original R implementation
Installation
pip install decontx-python
Quick Start
import scanpy as sc
import decontx
# Load and preprocess data with scanpy
adata = sc.read_h5ad("pbmc.h5ad")
sc.pp.filter_cells(adata, min_genes=200)
sc.pp.filter_genes(adata, min_cells=3)
sc.pp.normalize_total(adata)
sc.pp.log1p(adata)
sc.pp.highly_variable_genes(adata)
sc.pp.pca(adata)
sc.pp.neighbors(adata)
sc.tl.leiden(adata)
# Remove ambient RNA contamination
decontx.decontx(adata, cluster_key="leiden")
# Access results
contamination = adata.obs['decontX_contamination']
clean_counts = adata.layers['decontX_counts']
print(f"Mean contamination: {contamination.mean():.1%}")
print(f"Highly contaminated cells (>50%): {(contamination > 0.5).sum()}")
Why DecontX?
Ambient RNA contamination occurs when mRNA from lysed/stressed cells gets captured in droplets with other cells, causing:
- Cross-contamination between cell types
- Blurred cell type boundaries
- False positive marker gene expression
- Reduced clustering quality
DecontX models each cell as a mixture of:
- Native transcripts from the cell's true type
- Contaminating transcripts from other cell types in the sample
Method Comparison
Based on our benchmarking study:
| Method | Ambient RNA Removed | Precision | Conservativeness |
|---|---|---|---|
| SoupX | ~65% | High | Very conservative |
| DecontX | ~90% | Medium-High | Balanced |
| CellBender | ~90% | Medium | More aggressive |
Recommendation:
- Use SoupX for maximum safety and minimal false positives
- Use DecontX for balanced contamination removal in standard workflows
- Use CellBender when you can replace your entire preprocessing pipeline
API Reference
Main Function
decontx.decontx(
adata,
cluster_key="leiden",
max_iter=500,
delta=(10.0, 10.0),
estimate_delta=True,
convergence=0.001,
copy=False
)
Parameters:
adata: AnnData object with raw counts in.Xcluster_key: Column in.obscontaining cluster labelsmax_iter: Maximum EM iterations (default: 500)delta: Beta prior parameters for contamination (default: (10,10))estimate_delta: Whether to estimate delta parameters (default: True)convergence: Convergence threshold (default: 0.001)copy: Return copy or modify in place (default: False)
Returns:
Results stored in adata:
adata.obs['decontX_contamination']: Per-cell contamination estimatesadata.layers['decontX_counts']: Decontaminated count matrixadata.uns['decontX']: Model parameters and metadata
Utility Functions
# Get decontaminated counts as array
clean_counts = decontx.get_decontx_counts(adata)
# Get contamination estimates
contamination = decontx.get_decontx_contamination(adata)
# Simple simulation for testing
sim_data = decontx.simulate_contamination(n_cells=1000, n_genes=2000)
Performance Notes
- Python implementation is ~5-6x slower than R version
- Performance acceptable for typical datasets (<50k cells)
- Numba JIT compilation provides significant speedup after first run
- Memory usage scales linearly with dataset size
Integration with Existing Workflows
DecontX fits naturally into scanpy workflows:
# Standard scanpy analysis
sc.tl.leiden(adata, resolution=0.5)
sc.tl.rank_genes_groups(adata, 'leiden')
# Add decontamination
decontx.decontx(adata, cluster_key='leiden')
# Continue with decontaminated data
adata.X = adata.layers['decontX_counts']
sc.pp.log1p(adata) # Re-log transform clean counts
sc.pp.scale(adata)
sc.tl.pca(adata)
sc.pl.pca_variance_ratio(adata, n_pcs=50)
Citation
If you use DecontX in your research, please cite:
Yang, S., Corbett, S.E., Koga, Y. et al. Decontamination of ambient RNA in single-cell RNA-seq with DecontX. Genome Biol 21, 57 (2020). https://doi.org/10.1186/s13059-020-1950-6
Issues and Support
- 🐛 Report bugs: GitHub Issues
- 📖 Documentation: Read the Docs
- 💬 Questions: GitHub Discussions
License
MIT License - see LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file decontx_python-0.2.0.tar.gz.
File metadata
- Download URL: decontx_python-0.2.0.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74b1a049ae9633fa7052b291069caa06d71a80ac98636abc6d92d207f6dd8a10
|
|
| MD5 |
6d9ec1ec7ff75fe0405d2ba28fbc4a74
|
|
| BLAKE2b-256 |
fbe942942e5895bcf8adf76a63cec9de98ef06d4b6faa1960f1032d34cfaf315
|
File details
Details for the file decontx_python-0.2.0-py3-none-any.whl.
File metadata
- Download URL: decontx_python-0.2.0-py3-none-any.whl
- Upload date:
- Size: 15.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6675845cc3ad856c4c878640260529c101ad0964decbc62343c991a8926616d2
|
|
| MD5 |
abef0e7bb7205473b5b55ac2f5cdec1b
|
|
| BLAKE2b-256 |
30ecb6c047c2933c8937f6c353fc82a1e2fa7f8fb2101030a128850cb3d2c0da
|