Discrete mixture model based ploidy inference tool

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PloidPy

Introduction

PloidPy is a program written in Python designed to infer ploidy from next-generation reads aligned to a haploid reference genome. The program makes use of a stastical model representing the distribution of specific nucleotide counts and selects the most probable ploidy on the basis of a minimum AIC.

Installation

Installation of PloidPy is relatively simple and can be done easily using pip. The most recent stable version can be installed using the following command:

pip install PloidPy

And the developmental one can be installed directly from the git repository:

pip install git+git://github.com/floutt/PloidPy

Dependencies

In order to run PloidPy, the following dependencies are required:

Python 3.6+
NumPy
SciPy
Statsmodels
matplotlib
seaborn

Getting Started

Once installed, PloidPy can be run on an indexed BAM file. For this tutorial we will download some BAM files created using simulated data from the ploidyNGS repository.

wget https://github.com/diriano/ploidyNGS/raw/master/test_data/simulatedDiploidGenome/Ploidy2.bowtie2.sorted.bam
wget https://github.com/diriano/ploidyNGS/raw/master/test_data/simulatedTriploidGenome/Ploidy3.bowtie2.sorted.bam
wget https://github.com/diriano/ploidyNGS/raw/master/test_data/simulatedTetraploidGenome/Ploidy4.bowtie2.sorted.bam

After downloading these files, we have to index all of these files using samtools.

samtools index Ploidy2.bowtie2.sorted.bam
samtools index Ploidy3.bowtie2.sorted.bam
samtools index Ploidy4.bowtie2.sorted.bam

From here we can extract the minor allele count (MAC) and total read coverage from each of the BAM files using PloidPy process_bam. In the following example the Phred-like mapping threshold is set to 15

PloidPy process_bam --bam Ploidy2.bowtie2.sorted.bam --out diploid.count --quality 15
PloidPy process_bam --bam Ploidy3.bowtie2.sorted.bam --out triploid.count --quality 15
PloidPy process_bam --bam Ploidy4.bowtie2.sorted.bam --out tetraploid.count --quality 15

These commands will produce two different files per command - a gzip-compressed archive containing the count data (*count.gz) and a file containing important metadata needed for ploidy evaluation (*count.info).

After this has been done, one of two steps can be taken. Either inferring ploidy using the unfiltered data with an error component incorporated into the model or filtering the data and running the model with no error component. One can filter out the data from each individual using the following commands:

PloidPy filter --count_file diploid.count.gz --out diploid.count.filtered
PloidPy filter  --count_file triploid.count.gz --out triploid.count.filtered
PloidPy filter  --count_file tetraploid.count.gz --out tetraploid.count.filtered

This will create gzip-compressed files with count data filtering out data resulting from sequencing error and, with this, we can evaluate the ploidy of each individual using the PloidPy assess subcommand. PloidPy automatically detects whether or not the count file is filtered (by the presence or absence of the *info file) and adjusts the model accordingly. In this example, ploidies of 2n to 8n will be evaluated, although this can be done with any set of ploidies greater than or equal to 2n. This can be done with the following commands:

PloidPy assess --count_file diploid.count.filtered.gz --out diploid.filtered.tsv --ploidies 2 3 4 5 6 7 8
PloidPy assess --count_file triploid.count.filtered.gz --out triploid.filtered.tsv --ploidies 2 3 4 5 6 7 8
PloidPy assess --count_file tetraploid.count.filtered.gz --out tetraploid.filtered.tsv --ploidies 2 3 4 5 6 7 8

The evaluation on unfiltered can similarly be done with the following commands:

PloidPy assess --count_file diploid.count.gz --out diploid.tsv --ploidies 2 3 4 5 6 7 8
PloidPy assess --count_file triploid.count.gz --out triploid.tsv --ploidies 2 3 4 5 6 7 8
PloidPy assess --count_file tetraploid.count.gz --out tetraploid.tsv --ploidies 2 3 4 5 6 7 8

As you can see, in both cases all of the predictions were correct! Additional information can be found in the *tsv files. Each column represents the following:

Column	Meaning
Ploidy	Ploidy model
Log_Likelihood	Log likelihood of ploidy model
AIC	AIC value of ploidy model
Het_Weights	Weight parameter of each heterozygous state component
Uniform Weight	Weight parameter of the uniform component of the model
Binomial_Err_Weight	Weight parameter of the binomial error component (only present in unfiltered data)

Additionally, the filtered data can be used to produce helpful figures to visualize the joint distribution of TRC and MAC values. This can be done with the following commands:

PloidPy jdist --count_file diploid.count.filtered --out diploid
PloidPy jdist --count_file triploid.count.filtered --out triploid
PloidPy jdist --count_file tetraploid.count.filtered --out tetraploid

A visual overview of this process can be found here as well as some example joint distribution figures.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.2.1

Jun 1, 2020

1.2.0

May 23, 2020

1.1.0

May 18, 2020

1.0.0

Apr 14, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PloidPy-1.2.1.tar.gz (1.6 MB view details)

Uploaded Jun 1, 2020 Source

File details

Details for the file PloidPy-1.2.1.tar.gz.

File metadata

Download URL: PloidPy-1.2.1.tar.gz
Upload date: Jun 1, 2020
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.2

File hashes

Hashes for PloidPy-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`9c6d8f106d4638eae60505cc9e97897108784472b6b156ce98912eb7cb378b39`
MD5	`5319a35a5397041e0afdc0d1f1c66323`
BLAKE2b-256	`4a5c21db09bf3700df75f9baa2a261cc06cdf8af143193a18fe36ded4e43479b`

See more details on using hashes here.

PloidPy 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PloidPy

Introduction

Installation

Dependencies

Getting Started

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes