Skip to main content

Strain disambiguation methods for mixed DNA samples

Project description

Introduction

StrainPycon is a Python 3 package that can be used to disambiguate multiple strains in mixed samples of DNA. Mathematically, StrainPycon can solve binary blind source separation problems and compute certain high-dimensional integrals involving binary variables. The connection between these mathematical concepts and strain identification is discussed in the following journal article:

L. Mustonen, X. Gao, A. Santana, R.M. Mitchell, Y. Vigfusson, and L. Ruthotto,
A Bayesian framework for molecular strain identification from mixed diagnostic samples,
Inverse Problems 34(10), 105009, 2018,
https://doi.org/10.1088/1361-6420/aad7cd

StrainPycon builds on the StrainRecon.jl package written in Julia: https://github.com/lruthotto/StrainRecon.jl

Motivation

As a motivating example, suppose you have a blood sample infected by multiple Plasmodium falciparum malaria parasites. Assuming you have done PCR on chosen SNP sites, the number of calls that differ from the reference genome are indicative of what proportion of the strains have mutated at that SNP. StrainPycon is an approach for identifying the strains in the sample through disambiguation (deconvolution) without requiring any prior knowledge about the sample or the parasite. The process can also help assess the multiplicity of infection in the sample, which can aid malaria surveillance efforts, for instance.

Citation

If you use StrainPycon in your project, please cite the journal article above.

Full documentation

Please refer to the full documentation of StrainPycon at: https://www.ymsir.com/strainpycon/

Requirements

StrainPycon was tested in the following environment:

  • 64-bit Linux
  • Python 3.6.5 with NumPy 1.14.3

Basic usage

Usually, the user only wants to access a few methods from the StrainRecon class:

import strainpycon
S = strainpycon.StrainRecon()

Let us generate synthetic measurement data with three strains and 24 SNP sites and solve the inverse problem:

(measurements, strains, freq) = S.random_data(24, 3)
(strains_recon, freq_recon) = S.compute(measurements, 3)

Here, strains_recon should equal strains and freq_recon should equal freq.

Next, let us draw another random measurement, now with Gaussian additive noise. We compute the misfit, or negative log-likelihood, when the number of strains in the reconstruction varies from one to seven. Moreover, we compute posterior statistics to quantify uncertainty:

gamma = 0.1 # standard deviation of Gaussian noise
(measurements, strains, freq) = S.random_data(18, 4, gamma=gamma)
misfits = S.misfits(measurements, range(1,8))
(strains_mean, freq_mean, strains_dev, freq_dev) = S.posterior_stats(measurements, 4, gamma)

A complete description of the methods and detailed examples can be found on: See https://www.ymsir.com/strainpycon/

Known issues

StrainPycon does not support multi-threading yet.

Contacts

Please direct questions to: Ymir Vigfusson, Emory University, ymir.vigfusson@emory.edu

Project details


Release history Release notifications | RSS feed

This version

1.0

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

strainpycon-1.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

strainpycon-1.0-py2-none-any.whl (17.6 kB view details)

Uploaded Python 2

File details

Details for the file strainpycon-1.0.tar.gz.

File metadata

  • Download URL: strainpycon-1.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.6

File hashes

Hashes for strainpycon-1.0.tar.gz
Algorithm Hash digest
SHA256 5bbd9426aa658e75e3f2f3ba65175f52fe196ab8e14c8fc895272dda5d2246ba
MD5 3461fc3f866bcdd2ddc1b9dcb96e0da3
BLAKE2b-256 3662308f8c7ff04ca6d7493463314ea9270d0485ad7ca0f531d2089d352eee55

See more details on using hashes here.

File details

Details for the file strainpycon-1.0-py2-none-any.whl.

File metadata

  • Download URL: strainpycon-1.0-py2-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 2
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.4.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/2.7.6

File hashes

Hashes for strainpycon-1.0-py2-none-any.whl
Algorithm Hash digest
SHA256 1984c98cee78019d68be295b2e3d8c6529dd7be9c45dd0eed7e44f4102a0b96b
MD5 af3a70ca82eabe9f5cae721c69af57a6
BLAKE2b-256 6e2d9a10887a1912ff3c3a1ce57a2004e98b5173a28720d5a40119ede2f62d64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page