Provide a probability of false negatives for compositional count-based data
Project description
Zeroish
Usage:
zeroish -I <input_file> -O <output_file>
Additional details can be retrieved via
zeroish -h
Installation
Via pypi / pip:
pip install zeroish
The problem it solves:
Compositional data can be generated from a variety of sources, including (single-cell) RNA-sequencing, 16S or whole genome shotgun microbiome data, and others. In biological systems, a log-normal distribution is typical for the relative contributions of each feature in a given specimen. A few features will comprise the bulk of the counts for a specimen, with a long and often deep tail. Some of the most functionally important features (such as transcription factors, or methanogens in RNA or microbiomes respectively) are only found at very low levels relative to other features. Further, it typical that the total observations per specimen will vary--often by almost an order of magnitude, even within the same batch of specimens processed with the same methods.
This results in a variable limit of detection between specimens that often shadows over some of the most functionally important features. Sticking with the example of a critical transcription factor (e.g., LGR5
in the intestine as the critical transcription factor for intestinal stem cells) in single-cell RNA sequencing, is is expected that only a handful (1-5) reads will be assigned to LGR
out of 10,000 total reads when LGR
is expressed. If 'on' is only a few reads out of thousands, how trusted can zero recovered reads be to determine that LGR5
was not expressed in that cell? For each observation and each feature, what is the probability if that feature being expressed at its 'typical' level in that observation (even if zero counts were assigned)?
How zeroish solves this problem:
The probability if observing zero reads if the actual relative abundance was $m$ is:
$p(0|m) = e^{-n_{total} * m}$
where $p(0|m)$ is of $n_{total}$ total reads.
This is in turn derived from the poisson distribution: $p(k | n_{total}) = e^{-n_{total} * m} * \frac{{n_{total} * m}^k}{k!}$ and $k = 0$.
Using these concepts, zeroish
:
- Accepts a raw specimen-feature-count matrix.
- Generates the relative / fractional abundance matrix.
- For each feature, when detected determines the log-transformed distribution of relative abundances observed.
- Sets a per feature expected minimum relative observation rate as n percentile of this distribution (default: 2.5)
- Uses the total counts per specimen combined with the expected minimum relative observation rate to establish the probability there was actually that feature in this observation even though zero counts observed.
- Returns a probability-detected matrix.
Inputs:
Can be:
.csv
(comma-separated values),.tsv
(tab separated), or.txt
(whitespace delimited) specimen-feature-count matricies. These must be in strict 'wide' format, with one header column of features, one row per observation with the first column observation IDs. These can be gzipped, as indicated with a.gz
file suffix.
OR
- Anndata in
.h5ad
format with either.X
or a layer containing the raw untransformed count data. - Anndata can be
sparse
matricies but will always return a denseprobability_detected
matrix.
Outputs:
.csv
.tsv
or.txt
. If.gz
at the end, these will be gzipped as well..h5ad
for AnnData output. If provided an.h5ad
AnnData as input, the probability matrix will be added as a layer to this existing object and then saved as a new anndata.
Who to blame
Jonathan Golob, MD PhD. j-dev@golob.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file zeroish-0.0.2.tar.gz
.
File metadata
- Download URL: zeroish-0.0.2.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d4fa12cd3f0df78babde16a5c7cf47a8cee9e5b647603ce04dadfe782a1f1911 |
|
MD5 | 4375b595d03ed0025363de6b5294e059 |
|
BLAKE2b-256 | b4c228b6aaf731d8dcb2a0deeab58af6225a4e0a05711c0c033a16781f60c1fa |
File details
Details for the file zeroish-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: zeroish-0.0.2-py3-none-any.whl
- Upload date:
- Size: 6.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0df2399d3de53dcbc61f892bef200bf278a04ec7a3ee94eb1c3cbc36a5017374 |
|
MD5 | 609a147f58cb2c58b2c3b8d6c3e33733 |
|
BLAKE2b-256 | 725603aa81483ffdca392799dc185f971a76e88347a38d88072fb7f5d6615ff7 |