Skip to main content

Build a consensus sequence from a VCF and ref sequence masking low and no coverage positions.

Project description

VCF Consensus Builder

https://img.shields.io/pypi/v/vcf_consensus_builder.svg https://img.shields.io/travis/peterk87/vcf_consensus_builder.svg Documentation Status

Build a consensus sequence from a VCF and reference sequence masking low and no coverage positions.

You could use bcftools consensus but then you would need to apply the low and no coverage position masking after bcftools has generated the consensus, which may be tricky.

Features

  • Masks low and no coverage positions in the reference (default: 0X and <5X) with N and - by default

  • No need to bgzip the VCF file or index it like bcftools consensus requires.

Usage

Install

Install with pip from PyPI with

pip install vcf_consensus_builder

Show Help

Help message:

$ vcf_consensus_builder --help
Usage: vcf_consensus_builder [OPTIONS]

  Build a consensus sequence from a VCF and ref sequence masking low and no
  coverage positions.

Options:
  -v, --vcf-file PATH      VCF file path (v4)  [required]
  -d, --depths-file PATH   samtools depth output file (no headers)  [required]
  -r, --ref-fasta PATH     Reference sequence FASTA file (single sequence
                           entry only!)  [required]
  -o, --output-fasta TEXT  Output consensus sequence FASTA file path (default
                           write to stdout)
  --low-coverage INTEGER   Low coverage threshold; replace positions with less
                           than this depth with "N" by default
  --no-coverage INTEGER    No coverage threshold; replace positions with less
                           than or equal this depth with "-" by default
  --low-cov-char TEXT      Low coverage character ("N" by default)
  --no-cov-char TEXT       No coverage character ("-" by default)
  --sample-name TEXT       Optional sample name for output fasta header ID
  --help                   Show this message and exit.

Basic usage

Run on the test data including in the repo

# Clone this repo and enter it
$ git clone https://github.com/peterk87/vcf_consensus_builder.git --depth=1
$ cd vcf_consensus_builder/
# run vcf_consensus_builder on test data
$ vcf_consensus_builder -v tests/data/test.vcf \
    -d tests/data/test-depths.tsv \
    -r tests/data/ref.fa
# produces the following to stdout
>sample1 ref="ref ref"
NACCGTANACAATAN--

Masking of no and low coverage positions in reference sequence

vcf_consensus_builder first masks no and low coverage positions in the reference sequence file and then applies the ALT variants in the VCF.

NOTE: vcf_consensus_builder does not perform any VCF variant filtering. It is assumed that the VCF input file contains only variants you wish to see in your consensus sequence. Please use bcftools filter with appropriate filtering/exclusion expressions to get the variants you wish to see represented in your consensus (see https://samtools.github.io/bcftools/howtos/filtering.html for more info about how to filter your VCF file)

Given this reference sequence

>ref
NGCCAAGTCTNCGACATN-

And this samtools depth output

sample1     ref     1       4
sample1     ref     2       9
sample1     ref     3       9
sample1     ref     4       9
sample1     ref     5       9
sample1     ref     6       9
sample1     ref     7       10
sample1     ref     8       10
sample1     ref     9       10
sample1     ref     10      10
sample1     ref     11      3
sample1     ref     12      9
sample1     ref     13      9
sample1     ref     14      9
sample1     ref     15      9
sample1     ref     16      9
sample1     ref     17      5
sample1     ref     18      4
sample1     ref     19      0
sample1     ref     20      0

The low (below 5X) and no (0X) coverage positions in the reference sequence will be replaced with N and -, respectively.

The masked reference sequence will be:

>ref
NGCCAAGTCTNCGACATN-

This masked sequence will be used for generating the consensus sequence.

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2019-12-24)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf_consensus_builder-0.1.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

vcf_consensus_builder-0.1.0-py2.py3-none-any.whl (8.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file vcf_consensus_builder-0.1.0.tar.gz.

File metadata

  • Download URL: vcf_consensus_builder-0.1.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.0

File hashes

Hashes for vcf_consensus_builder-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fab9f8d97991737e6d72242c0f3960f0cd9ed0f704bcdd1ec1ca75fc447d917f
MD5 651c3fec0d2af91358a3dacb447a7a97
BLAKE2b-256 3fd130a4e0609731024ab0863cf94161702fc898c8267f36947d3662dcce557e

See more details on using hashes here.

File details

Details for the file vcf_consensus_builder-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: vcf_consensus_builder-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 8.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.8.0

File hashes

Hashes for vcf_consensus_builder-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 b08a6c8a3fbf06c86c90d2d1d02f0b3d0796ead05246d59af85189901e592889
MD5 b1d31213b89ba505f6cee0c4ce0155f1
BLAKE2b-256 bc484839aa0c82e978fa808f68c65ebe519e8185e52b7ac7083a6cf465908839

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page