Haploidy with Python
Project description
HapPy
Haploidy using Python.
Easy haploidy estimation.
1. General
This tool assesses the haploidy H of a given assembly. H is defined as the fraction of the bases of the genome that are in the collapsed peak C. This metrics is calculated as H=C/(C+U/2), where C is the size (area) of the collapsed peak and U the size of the uncollapsed peak in the per-base coverage histogram of the assembly.
For more information, see: Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms, Nadège Guiglielmoni, Antoine Houtain, Alessandro Derzelle, Karine Van Doninck, Jean-François Flot, bioRxiv 2020, doi: https://doi.org/10.1101/2020.03.16.993428
Requirements:
sambamba
scipy
pandas
numpy
matplotlib
$ python HapPy/main.py -h
Estimate assembly haploidy based on base depth of coverage histogram.
usage:
HapPy [-hv] <command> [<args>...]
options:
-h, --help shows the help
-v, --version shows the version
The subcommands are:
coverage Compute coverage histogram.
estimate Finds peaks and modality, then computes scores of haploidy.
2. Module coverage
This module runs sambamba
on a read alignment file then reads the output depth file to obtain a coverage histogram.
$ python HapPy/main.py coverage -h
Coverage histogram command
Compute coverage histogram for mapping file.
usage:
coverage [--threads=1] --outdir=DIR <mapping.bam>
arguments:
mapping.bam Sorted BAM file after mapping reads to the assembly.
options:
-t, --threads=INT Number of parallel threads allocated for
sambamba [default: 1].
-d, --outdir=DIR Path where the .cov and .hist files are written.
3. Module estimate
Takes the .hist output file of module coverage
and outputs metrics in a text file and optionnally as a graph. The size is provided with a value and a unit, ex: G for Gigabases, M for Megabases.
Usage:
$ python HapPy/main.py estimate -h
Estimate command
Compute haploidy from coverage histogram.
usage:
estimate [--max-contaminant=INT] [--max-diploid=INT] [--min-peak=INT]
--size=INT --outstats=FILE [--plot] <coverage.hist>
arguments:
coverage.hist Coverage histogram.
options:
-C, --max-contaminant=INT Maximum coverage of contaminants.
-D, --max-diploid=INT Maximum coverage of the diploid peak.
-M, --min-peak=INT Minimum peak height.
-S, --size=INT Estimated haploid genome size.
-O, --outstats=FILE Path where the AUC ratio and TSS values are written.
-p, --plot Generate histogram plot.
4. Example
Here is an example on how to use HapPy
. HapPy
requires a sorted BAM file as input. Here the PacBio long reads are mapped to the assembly with minimap2
, and the output is sorted with samtools
. The sorted BAM file is also indexed with samtools
. The module depth computes the coverage histogram from the BAM file, and the module then estimates the haploidy metrics H. Here the max x value for the contaminant peak is set to 35, the max x value for the diploid peak is set to 120, and the size is set to 102 Mb.
minimap2 -ax map-pb assembly.fasta.gz pacbio_reads.fasta.gz --secondary=no | samtools sort -o mapping_LR.map-pb.bam -T tmp.ali
samtools index mapping_LR.map-pb.bam
Hap.py coverage -d happy_output mapping_LR.map-pb.bam
Hap.py estimate --max-contaminant 35 --max-diploid 120 -S 102M -O happy_stats.txt -p happy_output/mapping_LR.map-pb.bam.hist
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file happy-AntoineHo-0.2.1rc0.tar.gz
.
File metadata
- Download URL: happy-AntoineHo-0.2.1rc0.tar.gz
- Upload date:
- Size: 17.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1b9935fc30bf8974b4814f8e1b4f22d7ddf4ff16a6c92cda0114d7fa8a501682 |
|
MD5 | 011570daea6fa2c82c61970339a15186 |
|
BLAKE2b-256 | 5190a7f69b04f23a0dc25a3c6bb6e0f667fb9119c9eabd503b93f10aa8d139e9 |
File details
Details for the file happy_AntoineHo-0.2.1rc0-py3-none-any.whl
.
File metadata
- Download URL: happy_AntoineHo-0.2.1rc0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c0fabfc18833423bb35e70656aad4f7fd1ac4305b7314c619cfcdcb7f7f5dc5 |
|
MD5 | 30e4b4c306a708b39f79e7dfc7eba016 |
|
BLAKE2b-256 | 032a9d0e25a3d1237a3112c30ae354ee6b5914ec20865337d6118391942b8ba7 |