Pure-Python port of the R package FastCAR — fast, deterministic per-gene ambient-RNA correction for droplet-based single-cell RNA-seq.
Project description
pyfastcar
A pure-Python re-implementation of FastCAR (Gocht / Berg et al., BMC Genomics 2023) for fast, deterministic per-gene correction of ambient RNA in droplet-based single-cell RNA-seq data.
- AnnData-native — drop-in for the scanpy ecosystem
- No
rpy2, no R install — the empty-droplet detection, per-genegMax/frCprofiling, and count subtraction are all implemented directly in NumPy/SciPy - Same function surface as the R workflow (
determine.background.to.remove→remove.background, plusdescribe.ambient.RNA.sequence/recommend.empty.cutoff) - Bit-for-bit reproducibility against the R reference — FastCAR is fully deterministic (no RNG anywhere), so
pyfastcarmatches R FastCAR exactly (seetests/test_exact_match.py)
This is a standalone mirror of the canonical implementation that lives in
omicverse. All algorithmic work is developed upstream in omicverse and synced here for users who want FastCAR without the full omicverse stack.
Install
pip install pyfastcar
Dependencies: numpy, scipy, pandas, anndata.
What it does
Droplet-based scRNA-seq libraries are contaminated by ambient RNA — transcripts released by lysed cells that end up in every droplet. FastCAR estimates this contamination from the empty droplets of the raw, unfiltered count matrix and subtracts it, gene by gene, from the real cells.
The algorithm is fully deterministic — a handful of vectorised operations on the raw count matrix:
- Empty droplets are the libraries whose total UMI count is below an
empty_droplet_cutoff(thE, default 100). All-zero "unused barcodes" are excluded from the empty-droplet population. - For each gene, compute
gMax— the highest count of that gene in any single empty droplet — andfrC— the fraction of (non-zero) empty droplets that contain the gene. - A gene is corrected only when its
frCclears the allowable contamination fractioncontamination_chance_cutoff(frAA, default 0.005). For those genes,gMaxis subtracted from every cell's count, with negative results floored at zero.
Quick-start (AnnData API)
import anndata as ad
from pyfastcar import correct_anndata
# raw = unfiltered AnnData (cells × genes): real cells + empty droplets
raw = ad.read_h5ad("raw_feature_bc_matrix.h5ad")
cells = ad.read_h5ad("filtered_feature_bc_matrix.h5ad") # the real cells
corrected = correct_anndata(
raw, cells,
empty_droplet_cutoff=100, contamination_chance_cutoff=0.005,
)
corrected.uns["fastcar"]["ambient_rna_profile"] # per-gene subtract amounts
corrected.uns["fastcar"]["diagnostics"] # gMax / frC table
corrected.uns["fastcar"]["n_genes_corrected"]
The wrapper transposes internally, so you always work in the natural cells × genes AnnData orientation.
Low-level functional API (mirrors R one-to-one)
from pyfastcar import (
determine_background_to_remove, remove_background,
describe_ambient_rna_sequence, recommend_empty_cutoff,
)
# full = unfiltered genes × droplets matrix (real cells + empty droplets)
# cells = filtered genes × cells matrix of real cells
profile = determine_background_to_remove(
full, cells, empty_droplet_cutoff=100, contamination_chance_cutoff=0.005)
corrected = remove_background(cells, profile) # ambient-corrected matrix
# pass return_table=True for a per-gene diagnostic DataFrame:
profile, table = determine_background_to_remove(full, cells, return_table=True)
Choosing the empty-droplet cutoff
desc = describe_ambient_rna_sequence(
full, start=50, stop=500, by=25, contamination_chance_cutoff=0.005)
cutoff = recommend_empty_cutoff(desc)
What's included
| Python | R counterpart | Purpose |
|---|---|---|
determine_background_to_remove |
determine.background.to.remove |
per-gene ambient-RNA profile (gMax/frC) |
remove_background |
remove.background |
subtract the profile, floor at zero |
correct_anndata |
(new) | AnnData-native one-shot wrapper |
describe_ambient_rna_sequence |
describe.ambient.RNA.sequence |
empty-droplet cutoff profiling |
recommend_empty_cutoff |
recommend.empty.cutoff |
suggest a cutoff from the profile |
read_cell_matrix / read_full_matrix |
read.cell.matrix / read.full.matrix |
10x CellRanger loaders |
Reproducing R results exactly
FastCAR has no stochastic component — given the same raw matrix and the same
empty_droplet_cutoff / contamination_chance_cutoff, the output is fully
determined. pyfastcar therefore reproduces R FastCAR bit-for-bit: the
per-gene subtraction amounts, the corrected integer matrix and the
threshold-profiling table are all identical.
tests/test_exact_match.py runs the R reference (FastCAR::determine.background.to.remove
remove.background+describe.ambient.RNA.sequence) inside theCMAPenvironment on a deterministic synthetic raw matrix, saves every intermediate, and checks that the Python port reproduces them element-for-element.
pip install -e ".[dev]"
pytest tests/ -q # smoke tests + R-parity tests
pytest tests/test_smoke.py -q # smoke tests only (no R required)
examples/compare_R_vs_Python.ipynb runs both implementations on a real raw
10x dataset (data/pbmc1k_raw.h5ad, a subset of 10x Genomics' pbmc_1k_v3
raw feature-barcode matrix that still contains empty droplets) and visualizes
the bit-exact agreement with omicverse.
Relationship to omicverse
Developed upstream in omicverse:
- Canonical implementation: lives in the omicverse single-cell preprocessing stack
- Standalone mirror (this repo): same code, same API, minus the omicverse packaging
Citation
If you use this package, please cite the original FastCAR paper:
Gocht A.M., Berg M., et al. FastCAR: fast correction for ambient RNA to facilitate differential gene expression analysis in single-cell RNA-sequencing datasets. BMC Genomics 24, 2023.
and acknowledge omicverse / this repo for the Python port.
License
Apache-2.0. The upstream R package FastCAR is GPL-3; this is an independent re-implementation of its published algorithm.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyfastcar-0.1.0.tar.gz.
File metadata
- Download URL: pyfastcar-0.1.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
696913bae3d15c42af39c8332d3bebe34eb84cdbb3aeb648e2a5f2cf8452aee8
|
|
| MD5 |
af05e27eea7080800fc0ac8dbddff443
|
|
| BLAKE2b-256 |
ea47c3cd89e93891ee6492a3f29aaa2d5a91046300cdb260a69b091c4d5250e2
|
Provenance
The following attestation bundles were made for pyfastcar-0.1.0.tar.gz:
Publisher:
publish.yml on omicverse/py-fastcar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfastcar-0.1.0.tar.gz -
Subject digest:
696913bae3d15c42af39c8332d3bebe34eb84cdbb3aeb648e2a5f2cf8452aee8 - Sigstore transparency entry: 1599051567
- Sigstore integration time:
-
Permalink:
omicverse/py-fastcar@376ff84d09cebdb5e6fadfd9c4f6b8240624584c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@376ff84d09cebdb5e6fadfd9c4f6b8240624584c -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file pyfastcar-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pyfastcar-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f2a77fff8bcc078ad83b3dc995b72db77f69f178daa54281df1daf26864f8cc
|
|
| MD5 |
2fc72a0a47c12ebdd147cca78b8a8aa7
|
|
| BLAKE2b-256 |
2b30f054df6cf5933cd59db9d3b8551b694def83c799ca94fe942d5100bdda83
|
Provenance
The following attestation bundles were made for pyfastcar-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on omicverse/py-fastcar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pyfastcar-0.1.0-py3-none-any.whl -
Subject digest:
6f2a77fff8bcc078ad83b3dc995b72db77f69f178daa54281df1daf26864f8cc - Sigstore transparency entry: 1599051669
- Sigstore integration time:
-
Permalink:
omicverse/py-fastcar@376ff84d09cebdb5e6fadfd9c4f6b8240624584c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/omicverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@376ff84d09cebdb5e6fadfd9c4f6b8240624584c -
Trigger Event:
workflow_dispatch
-
Statement type: