Pure-Python port of the R package SoupX — removal of ambient (soup) RNA contamination from droplet-based single-cell RNA-seq data.
Project description
pysoupx
A pure-Python re-implementation of SoupX (Young & Behjati, GigaScience 2020, 9(12):giaa151) for removing ambient ("soup") mRNA contamination from droplet-based single-cell RNA-seq data.
- AnnData-native — drop-in for the scanpy ecosystem (
load_10x,SoupChannel.from_anndata,to_anndata) - No
rpy2, no R install — soup-profile estimation, the tf-idf marker search, theautoEstContposterior, and the constrainedadjustCountssubtraction are all implemented directly in NumPy/SciPy - Same function surface as the R workflow (
estimateSoup→setClusters→autoEstCont→adjustCounts) - Bit-for-bit reproducibility against the R reference on the deterministic kernels (see
tests/test_r_parity.py)
This is a standalone mirror of the canonical implementation that lives in
omicverse. All algorithmic work is developed upstream in omicverse and synced here for users who want SoupX without the full omicverse stack.
Install
pip install pysoupx
Dependencies: numpy, scipy, pandas, anndata, statsmodels (and matplotlib for the optional diagnostic plot).
Quick-start
SoupX needs the raw unfiltered droplet matrix — the soup profile is estimated from the empty droplets — plus the filtered cell matrix.
import pysoupx as soup
# --- from a 10x CellRanger output folder -------------------------
sc = soup.load_10x("path/to/cellranger/outs") # raw + filtered
# --- or from AnnData objects -------------------------------------
# filtered = cells x genes ; raw = droplets x genes
sc = soup.SoupChannel.from_anndata(filtered, raw=raw, cluster_key="leiden")
# 1) soup profile is estimated automatically on construction
sc.soup_profile.head()
# 2) clusters + automatic contamination estimate
sc = soup.set_clusters(sc, cell_to_cluster) # dict or sequence
sc = soup.auto_est_cont(sc) # sets meta_data['rho']
# 3) corrected count matrix (genes x cells, scipy sparse)
corrected = soup.adjust_counts(sc, round_to_int=True)
adata_corrected = soup.to_anndata(sc, corrected=corrected)
Low-level functional API (mirrors R one-to-one)
from pysoupx import (
estimate_soup, set_soup_profile, set_clusters,
set_contamination_fraction, quick_markers,
estimate_non_expressing_cells, calculate_contamination_fraction,
auto_est_cont, adjust_counts, alloc, expand_clusters,
)
# Manual contamination fraction instead of autoEstCont
sc = set_contamination_fraction(sc, 0.10)
# Estimate rho from a user-supplied non-expressed gene set
ute = estimate_non_expressing_cells(sc, gene_set)
calculate_contamination_fraction(sc, gene_set, ute)
What's included
| Python | R counterpart | Purpose |
|---|---|---|
SoupChannel / SoupChannel.from_anndata |
SoupChannel |
bundles droplets / counts / soup profile / metadata |
estimate_soup |
estimateSoup |
per-gene soup fraction from empty droplets |
set_soup_profile |
setSoupProfile |
set a soup profile manually |
set_clusters |
setClusters |
attach a cell→cluster mapping |
set_contamination_fraction |
setContaminationFraction |
set rho manually |
quick_markers |
quickMarkers |
tf-idf cluster-marker genes |
estimate_non_expressing_cells |
estimateNonExpressingCells |
which cells truly lack a gene set |
calculate_contamination_fraction |
calculateContaminationFraction |
rho from non-expressed gene sets |
auto_est_cont |
autoEstCont |
fully automatic rho estimate |
adjust_counts |
adjustCounts |
soup-subtracted corrected matrix |
alloc / expand_clusters |
alloc / expandClusters |
the constrained redistribution primitives |
load_10x |
load10X |
read a 10x CellRanger folder |
to_anndata / make_soup_channel |
(AnnData helpers) | round-trip with the scanpy ecosystem |
plot_contamination_fraction |
autoEstCont(doPlot=TRUE) |
diagnostic posterior plot |
adjust_counts supports all three SoupX methods: subtraction (default), soupOnly and multinomial.
Reproducing R results exactly
SoupX's core kernels are deterministic, so feeding both ports identical raw + filtered matrices yields bit-for-bit agreement:
| Quantity | Result |
|---|---|
Soup profile (estimateSoup) |
bit-exact (max abs diff ~1e-16) |
quickMarkers tf-idf / qvals / idf |
bit-exact (max abs diff ~1e-16) |
adjustCounts cluster-level, fixed rho |
bit-exact (max abs diff 0) |
adjustCounts cell-level, fixed rho |
bit-exact (max abs diff ~1e-13) |
autoEstCont rho |
exact (identical posterior mode) |
tests/test_r_parity.py runs the R reference (r_reference_driver.R) inside the CMAP R env on the same synthetic raw + filtered matrices the Python side uses, and checks the soup profile, marker table, rho and corrected matrices match. examples/compare_R_vs_Python.ipynb does the same on the real SoupX toyData 10x dataset and visualises the agreement with omicverse.
The only intrinsically stochastic steps are the optional integer rounding (round_to_int) and the multinomial method's tie-breaking, both seedable via the seed argument.
Relationship to omicverse
Developed upstream in omicverse:
- Canonical implementation lives in omicverse
- Standalone mirror (this repo): same code, same API, minus the omicverse packaging
Citation
If you use this package, please cite the original SoupX paper:
Young, M.D. & Behjati, S. SoupX removes ambient RNA contamination from droplet-based single-cell RNA sequencing data. GigaScience 9, giaa151 (2020).
and acknowledge omicverse / this repo for the Python port.
License
Apache-2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pysoupx-0.1.0.tar.gz.
File metadata
- Download URL: pysoupx-0.1.0.tar.gz
- Upload date:
- Size: 34.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c85b551b6277396813a3781f4b093ddddb4aef3f2c769a30e6bf9a78b8178a45
|
|
| MD5 |
065290bfe8da3414e874cd9479c1fdc7
|
|
| BLAKE2b-256 |
b8ca468812c7eee73f1a7b73033c21591733147e1e04cbacccb198d04b63e51e
|
Provenance
The following attestation bundles were made for pysoupx-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-soupx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pysoupx-0.1.0.tar.gz -
Subject digest:
c85b551b6277396813a3781f4b093ddddb4aef3f2c769a30e6bf9a78b8178a45 - Sigstore transparency entry: 1599051685
- Sigstore integration time:
-
Permalink:
omicverse/py-soupx@e671ad10f2534556d66d442f70ca6d945fd13f25 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e671ad10f2534556d66d442f70ca6d945fd13f25 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pysoupx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pysoupx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 31.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3f6b6a3625dc299ddd6f374dfeff8e0646a6d0dbb83bfd73806f0f7a24fb22b
|
|
| MD5 |
d48f05f89f973d5f75f871bd99c2f376
|
|
| BLAKE2b-256 |
a72b8f54960456761b8b64878ebb9ec2b0fa257755da3a0b1912f61b9f0f0dd4
|
Provenance
The following attestation bundles were made for pysoupx-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-soupx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pysoupx-0.1.0-py3-none-any.whl -
Subject digest:
f3f6b6a3625dc299ddd6f374dfeff8e0646a6d0dbb83bfd73806f0f7a24fb22b - Sigstore transparency entry: 1599051781
- Sigstore integration time:
-
Permalink:
omicverse/py-soupx@e671ad10f2534556d66d442f70ca6d945fd13f25 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@e671ad10f2534556d66d442f70ca6d945fd13f25 -
Trigger Event:
workflow_dispatch
-
Statement type: