Pure-Python DoubletFinder — computational doublet detection in scRNA-seq via artificial-doublet pANN scoring, AnnData-native.
Project description
pydoubletfinder
A pure-Python re-implementation of DoubletFinder (McGinnis et al., Cell Systems 2019) for computational doublet detection in single-cell RNA-seq data.
- AnnData-native — drop-in for the scanpy ecosystem
- No
rpy2, no R install — the full pN/pK sweep, bimodality coefficient, BCmvn, and pANN scoring are all implemented directly in NumPy/SciPy - Same function surface as the R workflow (
paramSweep→summarizeSweep→find.pK→doubletFinder) - Bit-for-bit reproducibility against the R reference when fed matching PCA embeddings + artificial-doublet cell pairs (see
tests/test_exact_match.py)
This is a standalone mirror of the canonical implementation that lives in
omicverse(omicverse.single.DoubletFinder). All algorithmic work is developed upstream in omicverse and synced here for users who want DoubletFinder without the full omicverse stack.
Install
pip install pydoubletfinder
Quick-start (class API)
import anndata as ad
from pydoubletfinder import DoubletFinder
adata = ad.read_h5ad("mydata.h5ad") # cells × genes, raw counts in .X
df = DoubletFinder(adata)
# 1) pN/pK parameter sweep
df.param_sweep(PCs=10)
# 2) Bimodality coefficient summary
df.summarize_sweep()
# 3) Optimal pK via BCmvn
bcmvn = df.find_pK()
# 4) Final scoring + classification
df.run(pN=0.25, nExp=round(0.075 * adata.n_obs))
adata.obs[[c for c in adata.obs.columns if c.startswith("DF.")]]
Low-level functional API (mirrors R one-to-one)
from pydoubletfinder import (
param_sweep, summarize_sweep, find_pK,
doublet_finder, model_homotypic,
bimodality_coefficient,
)
# Per-real-cell pANN (needs a PCA embedding of [real + artificial] cells)
result = doublet_finder(
pca_coord=my_pca, # (n_real + n_doublets, n_PCs)
n_real_cells=n_real,
pN=0.25, pK=0.09, nExp=250,
)
result.pANN # np.ndarray
result.classifications # {"Singlet", "Doublet"} per real cell
result.column_name_DF # "DF.classifications_0.25_0.09_250"
# Homotypic-doublet proportion (match R modelHomotypic)
homotypic = model_homotypic(adata.obs["cluster"])
What's included
| Python | R counterpart | Purpose |
|---|---|---|
DoubletFinder class |
— | AnnData-native lifecycle wrapper (like Milo, Monocle) |
param_sweep |
paramSweep |
pN/pK sweep, one SweepEntry per (pN, pK) |
summarize_sweep |
summarizeSweep |
bimodality coefficient per sweep entry, optional AUC |
find_pK |
find.pK |
BCmvn + optimal-pK table |
doublet_finder |
doubletFinder |
pANN + Doublet/Singlet classification |
model_homotypic |
modelHomotypic |
homotypic-doublet proportion from cluster freqs |
bimodality_coefficient, skewness, kurtosis |
same | exported for direct use/testing |
bkde, approxfun |
KernSmooth::bkde, stats::approxfun |
KernSmooth-compatible KDE + R approxfun |
sample_artificial_doublets |
internal | expose doublet-pair sampling for reproducibility |
Reproducing R results exactly
The pipeline's randomness has two sources: which cell pairs become artificial doublets, and the PCA embedding of the merged matrix. To get identical outputs to an R run, provide both directly:
from pydoubletfinder import doublet_finder
result = doublet_finder(
pca_coord=r_pca_embedding, # from Seurat's reductions$pca@cell.embeddings
n_real_cells=len(real_cells),
pN=0.25, pK=0.09, nExp=250,
)
tests/test_exact_match.py runs the R reference (DoubletFinder::paramSweep + doubletFinder) inside the CMAP conda env, saves PCA coords and cell-pair indices, and checks that the Python port reproduces the pANN, BCreal, and classification vectors bit-for-bit.
Relationship to omicverse
Developed upstream in omicverse:
- Canonical implementation:
omicverse.single.DoubletFinder - Standalone mirror (this repo): same code, same API, minus the omicverse packaging
Citation
If you use this package, please cite the original DoubletFinder paper:
McGinnis, C.S., Murrow, L.M. & Gartner, Z.J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems 8, 329–337 (2019).
and acknowledge omicverse / this repo for the Python port.
License
CC0 — matches the upstream R package.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydoubletfinder-0.1.0.tar.gz.
File metadata
- Download URL: pydoubletfinder-0.1.0.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
324f021ff74b4d3d78e8c806782c6544fd02b025fadd799f48dbefc38adbc1cf
|
|
| MD5 |
36bcb1ded5a3fed9a84fa30d6a9677ae
|
|
| BLAKE2b-256 |
43586e01f28ef6f2a26aa07c52a18428f4008302661cbed14a9f2aff5ed58d80
|
File details
Details for the file pydoubletfinder-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pydoubletfinder-0.1.0-py3-none-any.whl
- Upload date:
- Size: 22.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26f333b50ab35d2ab3362bfba6b20b55c7c4928f05aa0812bd6a6d2d2b459da3
|
|
| MD5 |
d0b357ab3bb012d5ce00ad473b533290
|
|
| BLAKE2b-256 |
b7e15e190cf39fe3cdd4418346932cd5ea655a3e27998d6c0c8f4f6496dd453a
|