Build a consensus sequence from a VCF and ref sequence masking low and no coverage positions.
Project description
VCF Consensus Builder
Build a consensus sequence from a VCF and reference sequence masking low and no coverage positions.
You could use bcftools consensus but then you would need to apply the low and no coverage position masking after bcftools has generated the consensus, which may be tricky.
Free software: MIT license
Documentation: https://vcf-consensus-builder.readthedocs.io.
Features
Masks low and no coverage positions in the reference (default: 0X and <5X) with N and - by default
No need to bgzip the VCF file or index it like bcftools consensus requires.
Usage
Install
Install with pip from PyPI with
pip install vcf_consensus_builder
Show Help
Help message:
$ vcf_consensus_builder --help
Usage: vcf_consensus_builder [OPTIONS]
Build a consensus sequence from a VCF and ref sequence masking low and no
coverage positions.
Options:
-v, --vcf-file PATH VCF file path (v4) [required]
-d, --depths-file PATH samtools depth output file (no headers) [required]
-r, --ref-fasta PATH Reference sequence FASTA file (single sequence
entry only!) [required]
-o, --output-fasta TEXT Output consensus sequence FASTA file path (default
write to stdout)
--low-coverage INTEGER Low coverage threshold; replace positions with less
than this depth with "N" by default
--no-coverage INTEGER No coverage threshold; replace positions with less
than or equal this depth with "-" by default
--low-cov-char TEXT Low coverage character ("N" by default)
--no-cov-char TEXT No coverage character ("-" by default)
--sample-name TEXT Optional sample name for output fasta header ID
--help Show this message and exit.
Basic usage
Run on the test data including in the repo
# Clone this repo and enter it
$ git clone https://github.com/peterk87/vcf_consensus_builder.git --depth=1
$ cd vcf_consensus_builder/
# run vcf_consensus_builder on test data
$ vcf_consensus_builder -v tests/data/test.vcf \
-d tests/data/test-depths.tsv \
-r tests/data/ref.fa
# produces the following to stdout
>sample1 ref="ref ref"
NACCGTANACAATAN--
Masking of no and low coverage positions in reference sequence
vcf_consensus_builder first masks no and low coverage positions in the reference sequence file and then applies the ALT variants in the VCF.
NOTE: vcf_consensus_builder does not perform any VCF variant filtering. It is assumed that the VCF input file contains only variants you wish to see in your consensus sequence. Please use bcftools filter with appropriate filtering/exclusion expressions to get the variants you wish to see represented in your consensus (see https://samtools.github.io/bcftools/howtos/filtering.html for more info about how to filter your VCF file)
Given this reference sequence
>ref
NGCCAAGTCTNCGACATN-
And this samtools depth output
sample1 ref 1 4
sample1 ref 2 9
sample1 ref 3 9
sample1 ref 4 9
sample1 ref 5 9
sample1 ref 6 9
sample1 ref 7 10
sample1 ref 8 10
sample1 ref 9 10
sample1 ref 10 10
sample1 ref 11 3
sample1 ref 12 9
sample1 ref 13 9
sample1 ref 14 9
sample1 ref 15 9
sample1 ref 16 9
sample1 ref 17 5
sample1 ref 18 4
sample1 ref 19 0
sample1 ref 20 0
The low (below 5X) and no (0X) coverage positions in the reference sequence will be replaced with N and -, respectively.
The masked reference sequence will be:
>ref
NGCCAAGTCTNCGACATN-
This masked sequence will be used for generating the consensus sequence.
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2019-12-24)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for vcf_consensus_builder-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fab9f8d97991737e6d72242c0f3960f0cd9ed0f704bcdd1ec1ca75fc447d917f |
|
MD5 | 651c3fec0d2af91358a3dacb447a7a97 |
|
BLAKE2b-256 | 3fd130a4e0609731024ab0863cf94161702fc898c8267f36947d3662dcce557e |
Hashes for vcf_consensus_builder-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b08a6c8a3fbf06c86c90d2d1d02f0b3d0796ead05246d59af85189901e592889 |
|
MD5 | b1d31213b89ba505f6cee0c4ce0155f1 |
|
BLAKE2b-256 | bc484839aa0c82e978fa808f68c65ebe519e8185e52b7ac7083a6cf465908839 |