Skip to main content

Haploidy with Python

Project description

HapPy

Haploidy using Python.

Easy haploidy estimation.

DOI

1. General

This tool assesses the haploidy H of a given assembly. H is defined as the fraction of the bases of the genome that are in the collapsed peak C. This metrics is calculated as H=C/(C+U/2), where C is the size (area) of the collapsed peak and U the size of the uncollapsed peak in the per-base coverage histogram of the assembly.

For more information, see: Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms, Nadège Guiglielmoni, Antoine Houtain, Alessandro Derzelle, Karine Van Doninck, Jean-François Flot, bioRxiv 2020, doi: https://doi.org/10.1101/2020.03.16.993428

Requirements:

  • sambamba
  • scipy
  • pandas
  • numpy
  • matplotlib
$ python HapPy/main.py -h
Estimate assembly haploidy based on base depth of coverage histogram.

usage:
    HapPy [-hv] <command> [<args>...]

options:
    -h, --help                  shows the help
    -v, --version               shows the version

The subcommands are:
    coverage    Compute coverage histogram.
    estimate    Finds peaks and modality, then computes scores of haploidy.

2. Module coverage

This module runs sambamba on a read alignment file then reads the output depth file to obtain a coverage histogram.

$ python HapPy/main.py coverage -h

Coverage histogram command
    Compute coverage histogram for mapping file.

    usage:
        coverage [--threads=1] --outdir=DIR <mapping.bam>

    arguments:
        mapping.bam              Sorted BAM file after mapping reads to the assembly.

    options:
        -t, --threads=INT        Number of parallel threads allocated for 
                                 sambamba [default: 1].
        -d, --outdir=DIR         Path where the .cov and .hist files are written.

3. Module estimate

Takes the .hist output file of module coverage and outputs metrics in a text file and optionnally as a graph. The size is provided with a value and a unit, ex: G for Gigabases, M for Megabases.

Usage:

$ python HapPy/main.py estimate -h 
Estimate command
    Compute haploidy from coverage histogram.

    usage:
        estimate [--max-contaminant=INT] [--max-diploid=INT] [--min-peak=INT] 
                 --size=INT --outstats=FILE [--plot] <coverage.hist>

    arguments:
        coverage.hist               Coverage histogram.

    options:
        -C, --max-contaminant=INT   Maximum coverage of contaminants.
        -D, --max-diploid=INT       Maximum coverage of the diploid peak.
        -M, --min-peak=INT          Minimum peak height.
        -S, --size=INT              Estimated haploid genome size.
        -O, --outstats=FILE         Path where the AUC ratio and TSS values are written.
        -p, --plot                  Generate histogram plot.

4. Example

Here is an example on how to use HapPy. HapPy requires a sorted BAM file as input. Here the PacBio long reads are mapped to the assembly with minimap2, and the output is sorted with samtools. The sorted BAM file is also indexed with samtools. The module depth computes the coverage histogram from the BAM file, and the module then estimates the haploidy metrics H. Here the max x value for the contaminant peak is set to 35, the max x value for the diploid peak is set to 120, and the size is set to 102 Mb.

minimap2 -ax map-pb assembly.fasta.gz pacbio_reads.fasta.gz --secondary=no | samtools sort -o mapping_LR.map-pb.bam -T tmp.ali
samtools index mapping_LR.map-pb.bam
Hap.py coverage -d happy_output mapping_LR.map-pb.bam 
Hap.py estimate --max-contaminant 35 --max-diploid 120 -S 102M -O happy_stats.txt -p happy_output/mapping_LR.map-pb.bam.hist

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

happy-AntoineHo-0.2.1rc0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

happy_AntoineHo-0.2.1rc0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file happy-AntoineHo-0.2.1rc0.tar.gz.

File metadata

  • Download URL: happy-AntoineHo-0.2.1rc0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for happy-AntoineHo-0.2.1rc0.tar.gz
Algorithm Hash digest
SHA256 1b9935fc30bf8974b4814f8e1b4f22d7ddf4ff16a6c92cda0114d7fa8a501682
MD5 011570daea6fa2c82c61970339a15186
BLAKE2b-256 5190a7f69b04f23a0dc25a3c6bb6e0f667fb9119c9eabd503b93f10aa8d139e9

See more details on using hashes here.

File details

Details for the file happy_AntoineHo-0.2.1rc0-py3-none-any.whl.

File metadata

  • Download URL: happy_AntoineHo-0.2.1rc0-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/49.6.0.post20201009 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.9.1

File hashes

Hashes for happy_AntoineHo-0.2.1rc0-py3-none-any.whl
Algorithm Hash digest
SHA256 6c0fabfc18833423bb35e70656aad4f7fd1ac4305b7314c619cfcdcb7f7f5dc5
MD5 30e4b4c306a708b39f79e7dfc7eba016
BLAKE2b-256 032a9d0e25a3d1237a3112c30ae354ee6b5914ec20865337d6118391942b8ba7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page