Pure-Python DoubletFinder — computational doublet detection in scRNA-seq via artificial-doublet pANN scoring, AnnData-native.

These details have not been verified by PyPI

Project links

Project description

pydoubletfinder

A pure-Python re-implementation of DoubletFinder (McGinnis et al., Cell Systems 2019) for computational doublet detection in single-cell RNA-seq data.

AnnData-native — drop-in for the scanpy ecosystem
No rpy2, no R install — the full pN/pK sweep, bimodality coefficient, BCmvn, and pANN scoring are all implemented directly in NumPy/SciPy
Same function surface as the R workflow (paramSweep → summarizeSweep → find.pK → doubletFinder)
Bit-for-bit reproducibility against the R reference when fed matching PCA embeddings + artificial-doublet cell pairs (see tests/test_exact_match.py)

This is a standalone mirror of the canonical implementation that lives in omicverse (omicverse.single.DoubletFinder). All algorithmic work is developed upstream in omicverse and synced here for users who want DoubletFinder without the full omicverse stack.

Install

pip install pydoubletfinder

Quick-start (class API)

import anndata as ad
from pydoubletfinder import DoubletFinder

adata = ad.read_h5ad("mydata.h5ad")          # cells × genes, raw counts in .X

df = DoubletFinder(adata)

# 1) pN/pK parameter sweep
df.param_sweep(PCs=10)

# 2) Bimodality coefficient summary
df.summarize_sweep()

# 3) Optimal pK via BCmvn
bcmvn = df.find_pK()

# 4) Final scoring + classification
df.run(pN=0.25, nExp=round(0.075 * adata.n_obs))

adata.obs[[c for c in adata.obs.columns if c.startswith("DF.")]]

Low-level functional API (mirrors R one-to-one)

from pydoubletfinder import (
    param_sweep, summarize_sweep, find_pK,
    doublet_finder, model_homotypic,
    bimodality_coefficient,
)

# Per-real-cell pANN (needs a PCA embedding of [real + artificial] cells)
result = doublet_finder(
    pca_coord=my_pca,              # (n_real + n_doublets, n_PCs)
    n_real_cells=n_real,
    pN=0.25, pK=0.09, nExp=250,
)
result.pANN                          # np.ndarray
result.classifications               # {"Singlet", "Doublet"} per real cell
result.column_name_DF                # "DF.classifications_0.25_0.09_250"

# Homotypic-doublet proportion (match R modelHomotypic)
homotypic = model_homotypic(adata.obs["cluster"])

What's included

Python	R counterpart	Purpose
`DoubletFinder` class	—	AnnData-native lifecycle wrapper (like `Milo`, `Monocle`)
`param_sweep`	`paramSweep`	pN/pK sweep, one `SweepEntry` per (pN, pK)
`summarize_sweep`	`summarizeSweep`	bimodality coefficient per sweep entry, optional AUC
`find_pK`	`find.pK`	BCmvn + optimal-pK table
`doublet_finder`	`doubletFinder`	pANN + `Doublet`/`Singlet` classification
`model_homotypic`	`modelHomotypic`	homotypic-doublet proportion from cluster freqs
`bimodality_coefficient`, `skewness`, `kurtosis`	same	exported for direct use/testing
`bkde`, `approxfun`	`KernSmooth::bkde`, `stats::approxfun`	KernSmooth-compatible KDE + R approxfun
`sample_artificial_doublets`	internal	expose doublet-pair sampling for reproducibility

Reproducing R results exactly

The pipeline's randomness has two sources: which cell pairs become artificial doublets, and the PCA embedding of the merged matrix. To get identical outputs to an R run, provide both directly:

from pydoubletfinder import doublet_finder

result = doublet_finder(
    pca_coord=r_pca_embedding,      # from Seurat's reductions$pca@cell.embeddings
    n_real_cells=len(real_cells),
    pN=0.25, pK=0.09, nExp=250,
)

tests/test_exact_match.py runs the R reference (DoubletFinder::paramSweep + doubletFinder) inside the CMAP conda env, saves PCA coords and cell-pair indices, and checks that the Python port reproduces the pANN, BCreal, and classification vectors bit-for-bit.

Relationship to omicverse

Developed upstream in omicverse:

Canonical implementation: omicverse.single.DoubletFinder
Standalone mirror (this repo): same code, same API, minus the omicverse packaging

Citation

If you use this package, please cite the original DoubletFinder paper:

McGinnis, C.S., Murrow, L.M. & Gartner, Z.J. DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors. Cell Systems 8, 329–337 (2019).

and acknowledge omicverse / this repo for the Python port.

License

CC0 — matches the upstream R package.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydoubletfinder-0.1.0.tar.gz (25.6 kB view details)

Uploaded Apr 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydoubletfinder-0.1.0-py3-none-any.whl (22.3 kB view details)

Uploaded Apr 18, 2026 Python 3

File details

Details for the file pydoubletfinder-0.1.0.tar.gz.

File metadata

Download URL: pydoubletfinder-0.1.0.tar.gz
Upload date: Apr 18, 2026
Size: 25.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pydoubletfinder-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`324f021ff74b4d3d78e8c806782c6544fd02b025fadd799f48dbefc38adbc1cf`
MD5	`36bcb1ded5a3fed9a84fa30d6a9677ae`
BLAKE2b-256	`43586e01f28ef6f2a26aa07c52a18428f4008302661cbed14a9f2aff5ed58d80`

See more details on using hashes here.

File details

Details for the file pydoubletfinder-0.1.0-py3-none-any.whl.

File metadata

Download URL: pydoubletfinder-0.1.0-py3-none-any.whl
Upload date: Apr 18, 2026
Size: 22.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for pydoubletfinder-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`26f333b50ab35d2ab3362bfba6b20b55c7c4928f05aa0812bd6a6d2d2b459da3`
MD5	`d0b357ab3bb012d5ce00ad473b533290`
BLAKE2b-256	`b7e15e190cf39fe3cdd4418346932cd5ea655a3e27998d6c0c8f4f6496dd453a`

See more details on using hashes here.

pydoubletfinder 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pydoubletfinder

Install

Quick-start (class API)

Low-level functional API (mirrors R one-to-one)

What's included

Reproducing R results exactly

Relationship to omicverse

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes