Skip to main content

A tool for generating influenza A virus genome sequences from FASTQ data

Project description

FluViewer

A tool for generating influenza A virus genome sequences from FASTQ data

Installation

  1. FluViewer requires the following dependencies, and it is recommended to install them in a FluViewer virtual environment (indicated versions were tested, but later versions can likely be substituted):
  • python v3.8.5
  • pandas v1.3.5
  • spades v3.15.3
  • blast v2.12.0
  • bwa v0.7.17
  • samtools v1.14
  • bcftools v1.14
  • bedtools v2.30.0
  • seqtk v1.3
  1. Once the dependencies have been installed, install the latest FluViewer release via PyPI:
pip3 install FluViewer
  1. Download and unzip the default FluViewer DB (FluViewer_db.fa.gz) from this repository. Custom DBs can be created and used as well (instructions below).

Usage

FluViewer -f <path_to_fwd_reads> -r <path_to_rev_reads> -d <path_to_db_file> -o <output_name> -m <mode> [-D <min_depth> -q <min_qual> -c <min_cov> -i <min_id>] [-g]

Required arguments:

-f : path to FASTQ file containing forward reads

-r : path to FASTQ file containing reverse reads

-d : path to FASTA file containing FluViewer database (details below)

-o : output name (creates directory with this name for output, includes this name in output files, and in consensus sequence headers)

-m : FluViewer run mode (align or assemble)

Optional arguments:

-D : Minimum read depth for base calling (default = 20)

-q : Minimum PHRED score for base quality and mapping quality (default = 30)

-c : Minimum coverage of database reference sequence by contig (percentage, default = 25)

-i : Minimum nucleotide sequence identity between database reference sequence and contig (percentage, default = 95)

Optional flags:

-g : Set this flag to deactivate garbage collection and retain intermediate files

FluViewer Database

FluViewer requires a curated FASTA file "database" of influenza A virus reference sequences. Headers for these sequences must be formatted and annotated as follows:

>unique_id|strain_name|segment|subtype

For example:

>MF599463|A/swine/Kansas/A01378028/2017|HA|H3

FluViewer Output

FluViewer generates three outputfiles:

  1. A FASTA file containing consensus sequences for influenza A virus genome segments
  2. A sorted BAM file with reads mapped to either the choosen reference sequences (align mode) or the assembled contigs (assembly mode)
  3. A report TSV file describing segment, subtype, and sequencing metrics for each consensus sequence

Headers in the FASTA file have the following format:

>output_name_unique_sequence_number|segment|subject

The report TSV file contains the following columns:

consensus_seq : the name of the consensus sequence described by this row

segment : influenza A virus genome segment (PB2, PB1, PA, HA, NP, NA, M, NS)

subtype : HA or NA subtype ("none" for internal segments)

mapped reads : the number of sequencing reads mapped to this segment

seq_length : the length (in nucleotides) of the consensus sequence generated by FluViewer

sequenced_bases : the number of nucleotide positions in the consensus sequence with sufficient depth of coverage (set by -D argument) and a succesful base call (e.g. A, T, G, or C)

segment_cov : the number of sequenced bases in the consensus sequence divided by the typical length of this genome segment (as a percentage). The typical segment length is determined by finding the median length of the segment/subject reference sequences whose contig alignments have the highest bitscore.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

FluViewer-0.0.2.tar.gz (11.3 kB view hashes)

Uploaded Source

Built Distribution

FluViewer-0.0.2-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page