Skip to main content

Provide a probability of false negatives for compositional count-based data

Project description

Zeroish

Usage:

zeroish -I <input_file> -O <output_file>

Additional details can be retrieved via zeroish -h

Installation

Via pypi / pip: pip install zeroish

The problem it solves:

Compositional data can be generated from a variety of sources, including (single-cell) RNA-sequencing, 16S or whole genome shotgun microbiome data, and others. In biological systems, a log-normal distribution is typical for the relative contributions of each feature in a given specimen. A few features will comprise the bulk of the counts for a specimen, with a long and often deep tail. Some of the most functionally important features (such as transcription factors, or methanogens in RNA or microbiomes respectively) are only found at very low levels relative to other features. Further, it typical that the total observations per specimen will vary--often by almost an order of magnitude, even within the same batch of specimens processed with the same methods.

This results in a variable limit of detection between specimens that often shadows over some of the most functionally important features. Sticking with the example of a critical transcription factor (e.g., LGR5 in the intestine as the critical transcription factor for intestinal stem cells) in single-cell RNA sequencing, is is expected that only a handful (1-5) reads will be assigned to LGR out of 10,000 total reads when LGR is expressed. If 'on' is only a few reads out of thousands, how trusted can zero recovered reads be to determine that LGR5 was not expressed in that cell? For each observation and each feature, what is the probability if that feature being expressed at its 'typical' level in that observation (even if zero counts were assigned)?

How zeroish solves this problem:

The probability if observing zero reads if the actual relative abundance was $m$ is:

$p(0|m) = e^{-n_{total} * m}$

where $p(0|m)$ is of $n_{total}$ total reads.

This is in turn derived from the poisson distribution: $p(k | n_{total}) = e^{-n_{total} * m} * \frac{{n_{total} * m}^k}{k!}$ and $k = 0$.

Using these concepts, zeroish:

  • Accepts a raw specimen-feature-count matrix.
  • Generates the relative / fractional abundance matrix.
  • For each feature, when detected determines the log-transformed distribution of relative abundances observed.
  • Sets a per feature expected minimum relative observation rate as n percentile of this distribution (default: 2.5)
  • Uses the total counts per specimen combined with the expected minimum relative observation rate to establish the probability there was actually that feature in this observation even though zero counts observed.
  • Returns a probability-detected matrix.

Inputs:

Can be:

  • .csv (comma-separated values), .tsv (tab separated), or .txt (whitespace delimited) specimen-feature-count matricies. These must be in strict 'wide' format, with one header column of features, one row per observation with the first column observation IDs. These can be gzipped, as indicated with a .gz file suffix.

OR

  • Anndata in .h5ad format with either .X or a layer containing the raw untransformed count data.
  • Anndata can be sparse matricies but will always return a dense probability_detected matrix.

Outputs:

  • .csv .tsv or .txt. If .gz at the end, these will be gzipped as well.
  • .h5ad for AnnData output. If provided an .h5ad AnnData as input, the probability matrix will be added as a layer to this existing object and then saved as a new anndata.

Who to blame

Jonathan Golob, MD PhD. j-dev@golob.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

zeroish-0.0.2.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

zeroish-0.0.2-py3-none-any.whl (6.3 kB view details)

Uploaded Python 3

File details

Details for the file zeroish-0.0.2.tar.gz.

File metadata

  • Download URL: zeroish-0.0.2.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.6

File hashes

Hashes for zeroish-0.0.2.tar.gz
Algorithm Hash digest
SHA256 d4fa12cd3f0df78babde16a5c7cf47a8cee9e5b647603ce04dadfe782a1f1911
MD5 4375b595d03ed0025363de6b5294e059
BLAKE2b-256 b4c228b6aaf731d8dcb2a0deeab58af6225a4e0a05711c0c033a16781f60c1fa

See more details on using hashes here.

File details

Details for the file zeroish-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: zeroish-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.11.6

File hashes

Hashes for zeroish-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0df2399d3de53dcbc61f892bef200bf278a04ec7a3ee94eb1c3cbc36a5017374
MD5 609a147f58cb2c58b2c3b8d6c3e33733
BLAKE2b-256 725603aa81483ffdca392799dc185f971a76e88347a38d88072fb7f5d6615ff7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page