Build a consensus sequence from a VCF and ref sequence masking low and no coverage positions.
VCF Consensus Builder
Build a consensus sequence from a VCF and reference sequence masking low and no coverage positions.
You could use bcftools consensus but then you would need to apply the low and no coverage position masking after bcftools has generated the consensus, which may be tricky.
Free software: MIT license
Masks low and no coverage positions in the reference (default: 0X and <5X) with N and - by default
No need to bgzip the VCF file or index it like bcftools consensus requires.
Install with pip from PyPI with
pip install vcf_consensus_builder
$ vcf_consensus_builder --help Usage: vcf_consensus_builder [OPTIONS] Build a consensus sequence from a VCF and ref sequence masking low and no coverage positions. Options: -v, --vcf-file PATH VCF file path (v4) [required] -d, --depths-file PATH samtools depth output file (no headers) [required] -r, --ref-fasta PATH Reference sequence FASTA file (single sequence entry only!) [required] -o, --output-fasta TEXT Output consensus sequence FASTA file path (default write to stdout) --low-coverage INTEGER Low coverage threshold; replace positions with less than this depth with "N" by default --no-coverage INTEGER No coverage threshold; replace positions with less than or equal this depth with "-" by default --low-cov-char TEXT Low coverage character ("N" by default) --no-cov-char TEXT No coverage character ("-" by default) --sample-name TEXT Optional sample name for output fasta header ID --help Show this message and exit.
Run on the test data including in the repo
# Clone this repo and enter it $ git clone https://github.com/peterk87/vcf_consensus_builder.git --depth=1 $ cd vcf_consensus_builder/ # run vcf_consensus_builder on test data $ vcf_consensus_builder -v tests/data/test.vcf \ -d tests/data/test-depths.tsv \ -r tests/data/ref.fa # produces the following to stdout >sample1 ref="ref ref" NACCGTANACAATAN--
Masking of no and low coverage positions in reference sequence
vcf_consensus_builder first masks no and low coverage positions in the reference sequence file and then applies the ALT variants in the VCF.
NOTE: vcf_consensus_builder does not perform any VCF variant filtering. It is assumed that the VCF input file contains only variants you wish to see in your consensus sequence. Please use bcftools filter with appropriate filtering/exclusion expressions to get the variants you wish to see represented in your consensus (see https://samtools.github.io/bcftools/howtos/filtering.html for more info about how to filter your VCF file)
Given this reference sequence
And this samtools depth output
sample1 ref 1 4 sample1 ref 2 9 sample1 ref 3 9 sample1 ref 4 9 sample1 ref 5 9 sample1 ref 6 9 sample1 ref 7 10 sample1 ref 8 10 sample1 ref 9 10 sample1 ref 10 10 sample1 ref 11 3 sample1 ref 12 9 sample1 ref 13 9 sample1 ref 14 9 sample1 ref 15 9 sample1 ref 16 9 sample1 ref 17 5 sample1 ref 18 4 sample1 ref 19 0 sample1 ref 20 0
The low (below 5X) and no (0X) coverage positions in the reference sequence will be replaced with N and -, respectively.
The masked reference sequence will be:
This masked sequence will be used for generating the consensus sequence.
First release on PyPI.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for vcf_consensus_builder-0.1.0.tar.gz
Hashes for vcf_consensus_builder-0.1.0-py2.py3-none-any.whl