Skip to main content

Plot allele frequencies in VCF files

Project description

Build Status Coverage Status

afplot

This is a tool to plot allele frequencies in VCF files.

The two main subcommands that are available are: * regions: Plot single regions or regions from a bed file, optionally with a margin. * whole-genome: Create a single image for every chromosome on the genome.

Both subcommands have three modes:

  • histogram: This will create a histogram with kernel density plot of allele frequencies.

  • scatter: Create a scatter plot of allele frequencies, along the region or chromosome.

  • distance: Create a scatter plot of distances to theoretical allele frequencies, along the region or chromosome. This only makes sense for autosomes of diploid organisms.

By default, colors correspond to call type (hom_alt/ref/hom_ref).

Multiple VCF files can be supplied simultaneously for the whole-genome subcommand, in which case they can be grouped by label. When multiple VCF files are supplied, plots will be colored on label per VCF file.

Only one sample per VCF file can be plotted.

We currently assume the presence of an AD column in the FORMAT field. This column should contain the depth per allele, with the reference allele being first.

All VCFs should be indexed with tabix, and should contain contigs in the header.

Installation

afplot is available through pypi with: pip install afplot

Requirements

  • Python 3.4+

  • click

  • numpy

  • matplotlib

  • pandas

  • seaborn

  • progressbar2

  • pysam

  • pyvcf

Usage

Usage: afplot [OPTIONS] COMMAND [ARGS]...

  Plot allele frequencies in VCF files.

  Two basic modes exist:
    - regions: Plot histogram, scatter or distance plots per
      user-specified region.
    - whole-genome: Plot histogram, scatter or distance plots over the
      entire genome.

Options:
  --help  Show this message and exit.

Commands:
  regions       Region plots
  whole-genome  Whole-genome plots

Examples

Single VCF on a single region

  • afplot regions histogram -v my.vcf.gz -o output_dir -R chr1:100-200

Single VCF on a bed file

  • afplot regions histogram -v my.vcf.gz -o output_dir -L regions.bed

Single VCF whole genome

  • afplot whole-genome histogram -v my.vcf.gz -l my_label -s my_sample -o mysample.histogram.png

Multiple VCFs whole genome

  • afplot whole-genome histogram -v my1.vcf.gz -l my_label1 -s my_sample1 -v my2.vcf.gz -l my_label2 -s my_sample2 -o both_samples.histogram.png

Grouping samples can be achieved by supplying identical labels to samples. E.g.

  • afplot whole-genome histogram -v 1.vcf.gz -v 2.vcf.gz -v 3.vcf.gz -v 4.vcf.gz -l group1 -l group1 -l group2 -l group2 [...]

Excluding contigs on whole genome

In certain cases, you may not want to plot all contigs. For instance, when your vcf header contains many small unplaced contigs. This can be achieved by supplying a regex pattern to the -e flag. For instance, all contigs containing “gl” can be filtered out by doing:

  • afplot whole-genome [...] -e '.*gl.*'

Changelog

0.2

The entire command line interface was changed to use click, instead of regular argparse. This allows a more complex CLI. In stead of having flags for plot mode, afplot now uses subcommands.

While the CLI has changed, and the internals of afplot have been refactored, the old-style (version 0.1) API remains in place for now. This may be deprecated in the future.

Support for plotting regions was added. Region plotting outputs on a directory, rather than on a single file.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afplot-0.2.1.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

afplot-0.2.1-py3-none-any.whl (15.3 kB view details)

Uploaded Python 3

File details

Details for the file afplot-0.2.1.tar.gz.

File metadata

  • Download URL: afplot-0.2.1.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for afplot-0.2.1.tar.gz
Algorithm Hash digest
SHA256 e345777e40a3822c4a098933336861d99707dc88fe479453f40cb470f37b196b
MD5 f43752e34e4a8fbfed3cf1e951b4f4c2
BLAKE2b-256 b72d2052f6bc95f69826c3c82dcc1020f89ad1f6c61ea469944b2389dc67ccdc

See more details on using hashes here.

File details

Details for the file afplot-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for afplot-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54c1d8de87eb9eb84130aa4d9741f5b1cf3fc3966de554726da63c9499a5f146
MD5 3b097bb241cd99d48917bd92af54ab51
BLAKE2b-256 d3f6c0615986be54594dace49a0e621528a552b54c26b8b5c1de1da057890edb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page