A tool for generating influenza A virus genome sequences from FASTQ data
Project description
FluViewer
A tool for generating influenza A virus genome sequences from FASTQ data
Installation
- FluViewer requires the following dependencies, and it is recommended to install them in a FluViewer virtual environment (indicated versions were tested, but later versions can likely be substituted):
- python v3.8.5
- pandas v1.3.5
- spades v3.15.3
- blast v2.12.0
- bwa v0.7.17
- samtools v1.14
- bcftools v1.14
- bedtools v2.30.0
- seqtk v1.3
- Once the dependencies have been installed, install the latest FluViewer release via PyPI:
pip3 install FluViewer
- Download and unzip the default FluViewer DB (FluViewer_db.fa.gz) from this repository. Custom DBs can be created and used as well (instructions below).
Usage
FluViewer -f <path_to_fwd_reads> -r <path_to_rev_reads> -d <path_to_db_file> -o <output_name> -m <mode> [-D <min_depth> -q <min_qual> -c <min_cov> -i <min_id>] [-g]
Required arguments:
-f : path to FASTQ file containing forward reads
-r : path to FASTQ file containing reverse reads
-d : path to FASTA file containing FluViewer database (details below)
-o : output name (creates directory with this name for output, includes this name in output files, and in consensus sequence headers)
-m : FluViewer run mode (align or assemble)
Optional arguments:
-D : Minimum read depth for base calling (default = 20)
-q : Minimum PHRED score for base quality and mapping quality (default = 30)
-c : Minimum coverage of database reference sequence by contig (percentage, default = 25)
-i : Minimum nucleotide sequence identity between database reference sequence and contig (percentage, default = 95)
Optional flags:
-g : Set this flag to deactivate garbage collection and retain intermediate files
FluViewer Database
FluViewer requires a curated FASTA file "database" of influenza A virus reference sequences. Headers for these sequences must be formatted and annotated as follows:
>unique_id|strain_name|segment|subtype
For example:
>MF599463|A/swine/Kansas/A01378028/2017|HA|H3
FluViewer Output
FluViewer generates three outputfiles:
- A FASTA file containing consensus sequences for influenza A virus genome segments
- A sorted BAM file with reads mapped to either the choosen reference sequences (align mode) or the assembled contigs (assembly mode)
- A report TSV file describing segment, subtype, and sequencing metrics for each consensus sequence
Headers in the FASTA file have the following format:
>output_name_unique_sequence_number|segment|subject
The report TSV file contains the following columns:
consensus_seq : the name of the consensus sequence described by this row
segment : influenza A virus genome segment (PB2, PB1, PA, HA, NP, NA, M, NS)
subtype : HA or NA subtype ("none" for internal segments)
mapped reads : the number of sequencing reads mapped to this segment
seq_length : the length (in nucleotides) of the consensus sequence generated by FluViewer
sequenced_bases : the number of nucleotide positions in the consensus sequence with sufficient depth of coverage (set by -D argument) and a succesful base call (e.g. A, T, G, or C) segment_cov : the length of the consensus sequence divided by the typical length of this genome segment (as a percentage).
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for FluViewer-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 07f35aff5483026a1d54cb8d5b1a5a8b0310bc79abbadedbec621254a0397f45 |
|
MD5 | 078488523a2a39a3b7a7f9a2f4a9b7ef |
|
BLAKE2b-256 | d4e5ee7fecc3a4b68adf8bfdc8d183ab94eefc2da393df547fc54f1007c4af26 |