Skip to main content

Tool & python package for calculating genome wide pileup mappability

Project description

pupmapper logo

License: MIT Static Badge

Pupmapper: A Pileup Mappability Calculator

Table of Contents

Motivation

The Pileup Mappability metric can be used to quickly identify regions which may be more difficult to perform variant calling with short-read WGS data. pupmapper was created to allow users to quickly convert k-mer mappability scores to pileup mappability.

The first step of the pupmapper pipeline is to calculate k-mer uniquness scores using the Genmap software. Then pupmapper will summarize the pileup mappability of each genomic position using the k-mer uniqueness of all overlapping k-mers.

How is pileup mappability calculated from individual k-mer uniqueness/mappability scores?

PmapFig The Pileup mappability of a position is specifically calculated as the mean k-mer mappability of all k-mers overlapping a given position.

A pileup mappability score of 1 indicates that all k-mers overlapping with a position are unique within the genome (using the user defined parameters of uniqueness).

Pileup mappability is useful because it gives a sense of uniquemess of all possible reads (of defined length) that could align to a given position.

Useful reading for k-mer mappability and pileup mappability:

Derrien, T, (2012). Fast Computation and Applications of Genome Mappability. PLOS ONE 7(1): e30377. https://doi.org/10.1371/journal.pone.0030377

Pockrandt C, (2020) GenMap: ultra-fast computation of genome mappability, Bioinformatics, Volume 36, Issue 12, June 2020, Pages 3687–3692, https://doi.org/10.1093/bioinformatics/btaa222

Lee H, Schatz MC. (2012). Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score, Bioinformatics, Volume 28, Issue 16, August 2012, Pages 2097–2105, https://doi.org/10.1093/bioinformatics/bts330

Installation

You will need to install both the pupmapper python package and ensure that the genmap software is installed (available on your $PATH environmental variable.)

Install locally

pupmapper can be installed by cloning this repository and installing with pip.

git clone git@github.com:maxgmarin/pupmapper.git

cd pupmapper

pip install . 

pip

pip install pupmapper

conda

🚧 Check back soon 🚧

Basic usage

1) run_all - Run the full pipeline starting with an input genome

pupmapper run_all -i Input.Genome.fasta -o output_directory/ -k 50 -e 1

The above command will first use genmap to calculate k-mer mappability scores for the input genome and then calculate pileup mappability scores.

Arguments:

-i, --in_genome_fa: Input genome FASTA file.
-o, --outdir: Directory for output files.
-k, --kmer_len: K-mer length (e.g., 50 bp).
-e, --errors: Number of allowed mismatches in k-mer mapping.
-g, --gff: (Optional) Input genome annotations in GFF format.
--save-numpy: (Optional) Save results as compressed numpy arrays.

Analyzing included test sequence

If you wish to run an pupmapper on a small test sequence (15 bp), you can run the following commands:

cd tests/data/Genmap_Ex1/

pupmapper run_all -i Ex1.genome.fasta -o Ex1_OutputDir -k 4 -e 0

This command will analyze the pileup mappability of the test sequence with a k-mer size of 4 bp and a max mismatch of 0 (K=4,E=0).

Full usage

pupmapper run_all --help
usage: pupmapper run_all [-h] -i IN_GENOME_FA -o OUTDIR -k KMER_LEN -e ERRORS [-g GFF] [--save-numpy]

optional arguments:
  -h, --help            show this help message and exit
  -i IN_GENOME_FA, --in_genome_fa IN_GENOME_FA
                        Input genome fasta file (.fasta)
  -o OUTDIR, --outdir OUTDIR
                        Directory for all outputs of k-mer and pileup mappability processing.
  -k KMER_LEN, --kmer_len KMER_LEN
                        k-mer length (bp) used to generate the k-mer mappability values
  -e ERRORS, --errors ERRORS
                        Number of errors (mismatches) allowed in Genmap's k-mer mappability calculation
  -g GFF, --gff GFF     GFF formatted genome annotations for input genome (.gff) (Optional)
  --save-numpy          If enabled, all pileup mappability scores will be output as compressed numpy arrays (.npz).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pupmapper-0.0.7.tar.gz (161.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pupmapper-0.0.7-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file pupmapper-0.0.7.tar.gz.

File metadata

  • Download URL: pupmapper-0.0.7.tar.gz
  • Upload date:
  • Size: 161.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for pupmapper-0.0.7.tar.gz
Algorithm Hash digest
SHA256 986796ee34e4f94758880a978ae7463984d24734b0e51f8409167e4f2bde85e8
MD5 bf4160e93675fed81f219d98b02c615b
BLAKE2b-256 4d595f72126a38f3b354cf8267a4e98b95e42cdddaf2191f077a814162ec9542

See more details on using hashes here.

File details

Details for the file pupmapper-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: pupmapper-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.8.18

File hashes

Hashes for pupmapper-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a09676e20b6bc8bd3f2219ba813190ea08bbcf6237a8226252d9a74ec6066ed9
MD5 d037a2d3d20ec7f5efcc008a1302d2e3
BLAKE2b-256 3bb3ba2054681fe18b7cebdcf7aefd3d0cb30f0a81a347fa827e29455e8e1c65

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page