Skip to main content

An automated host- and subtype-agnostic tool for generating influenza A virus genome sequences from FASTQ data

Project description

{"payload":{"allShortcutsEnabled":true,"fileTree":{"":{"items":[{"name":"FluViewer_db_v_0_1_8.fa.gz","path":"FluViewer_db_v_0_1_8.fa.gz","contentType":"file"},{"name":"FluViewer_v_0_1_8.py","path":"FluViewer_v_0_1_8.py","contentType":"file"},{"name":"README.md","path":"README.md","contentType":"file"}],"totalCount":3}},"fileTreeProcessingTime":1.850596,"foldersToFetch":[],"reducedMotionEnabled":"system","repo":{"id":466983019,"defaultBranch":"main","name":"FluViewer","ownerLogin":"KevinKuchinski","currentUserCanPush":true,"isFork":false,"isEmpty":false,"createdAt":"2022-03-06T23:13:09.000-08:00","ownerAvatar":"https://avatars.githubusercontent.com/u/66155251?v=4","public":true,"private":false,"isOrgOwned":false},"symbolsExpanded":false,"treeExpanded":true,"refInfo":{"name":"main","listCacheKey":"v0:1694207840.0","canEdit":true,"refType":"branch","currentOid":"d96340eb6ad42c2bde1dc005838b3347b069395d"},"path":"README.md","currentUser":{"id":66155251,"login":"KevinKuchinski","userEmail":"kevin.kuchinski@bccdc.ca"},"blob":{"rawLines":null,"stylingDirectives":null,"csv":null,"csvError":null,"dependabotInfo":{"showConfigurationBanner":null,"configFilePath":null,"networkDependabotPath":"/KevinKuchinski/FluViewer/network/updates","dismissConfigurationNoticePath":"/settings/dismiss-notice/dependabot_configuration_notice","configurationNoticeDismissed":false,"repoAlertsPath":"/KevinKuchinski/FluViewer/security/dependabot","repoSecurityAndAnalysisPath":"/KevinKuchinski/FluViewer/settings/security_analysis","repoOwnerIsOrg":false,"currentUserCanAdminRepo":true},"displayName":"README.md","displayUrl":"https://github.com/KevinKuchinski/FluViewer/blob/main/README.md?raw=true","headerInfo":{"blobSize":"6.38 KB","deleteInfo":{"deleteTooltip":"Delete this file"},"editInfo":{"editTooltip":"Edit this file"},"ghDesktopPath":"https://desktop.github.com","gitLfsPath":null,"onBranch":true,"shortPath":"c72f53a","siteNavLoginPath":"/login?return_to=https%3A%2F%2Fgithub.com%2FKevinKuchinski%2FFluViewer%2Fblob%2Fmain%2FREADME.md","isCSV":false,"isRichtext":true,"toc":[{"level":1,"text":"FluViewer","anchor":"fluviewer","htmlText":"FluViewer"},{"level":2,"text":"Installation","anchor":"installation","htmlText":"Installation"},{"level":2,"text":"Usage","anchor":"usage","htmlText":"Usage"},{"level":2,"text":"FluViewer Database","anchor":"fluviewer-database","htmlText":"FluViewer Database"},{"level":2,"text":"FluViewer Output","anchor":"fluviewer-output","htmlText":"FluViewer Output"}],"lineInfo":{"truncatedLoc":"127","truncatedSloc":"83"},"mode":"file"},"image":false,"isCodeownersFile":null,"isPlain":false,"isValidLegacyIssueTemplate":false,"issueTemplateHelpUrl":"https://docs.github.com/articles/about-issue-and-pull-request-templates","issueTemplate":null,"discussionTemplate":null,"language":"Markdown","languageID":222,"large":false,"loggedIn":true,"newDiscussionPath":"/KevinKuchinski/FluViewer/discussions/new","newIssuePath":"/KevinKuchinski/FluViewer/issues/new","planSupportInfo":{"repoIsFork":null,"repoOwnedByCurrentUser":null,"requestFullPath":"/KevinKuchinski/FluViewer/blob/main/README.md","showFreeOrgGatedFeatureMessage":null,"showPlanSupportBanner":null,"upgradeDataAttributes":null,"upgradePath":null},"publishBannersInfo":{"dismissActionNoticePath":"/settings/dismiss-notice/publish_action_from_dockerfile","dismissStackNoticePath":"/settings/dismiss-notice/publish_stack_from_file","releasePath":"/KevinKuchinski/FluViewer/releases/new?marketplace=true","showPublishActionBanner":false,"showPublishStackBanner":false},"renderImageOrRaw":false,"richText":"<article class="markdown-body entry-content container-lg" itemprop="text"><h1 tabindex="-1" dir="auto"><a id="user-content-fluviewer" class="anchor" aria-hidden="true" tabindex="-1" href="#fluviewer"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z">FluViewer\n<p dir="auto">FluViewer is an automated pipeline for generating influenza A virus (IAV) genome sequences from FASTQ data. If provided with a sufficiently diverse and representative database of IAV reference sequences, it can generate sequences regardless of host and subtype without any human intervention required.

\n<p dir="auto">Here is a brief description of the FluViewer process. First, the provided reads are normalized and downsampled using a kmer-based approach to reduce any excessive coverage of certain genome regions. Next, the normalized/downsampled reads are assembled de novo into contigs. The contigs are then aligned to a database of IAV reference sequences. These alignments are used to trim contigs and roughly position them within their respective genome segment. Afterwards, a multiple sequencing alignment in conducted on the trimmed/positioned contigs, generating scaffold sequences for each IAV genome segment. Next, these scaffolds are aligned to the IAV reference sequence database to find their best matches. These best matches are used to fill in any missing regions in the scaffold, creating mapping references. The normalized/downsampled reads are mapped to these mapping references, then variants are called and the final consensus genomes are produced.

\n<h2 tabindex="-1" dir="auto"><a id="user-content-installation" class="anchor" aria-hidden="true" tabindex="-1" href="#installation"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z">Installation\n<ol dir="auto">\n
  • Create a virtual environment and install the following:
  • \n\n<ul dir="auto">\n
  • bbmap v39.01
  • \n
  • bcftools v1.17
  • \n
  • blast v2.14.1
  • \n
  • bwa v0.7.17
  • \n
  • clustalw v2.1
  • \n
  • freebayes v1.3.6
  • \n
  • pandas v2.0.3
  • \n
  • python 3.8.5
  • \n
  • samtools v1.17
  • \n
  • seaborn v0.12.2
  • \n
  • spades v3.15.3
  • \n\n<ol start="2" dir="auto">\n
  • Install the latest FluViewer release via PyPI:
  • \n\n<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="pip3 install FluViewer"><pre class="notranslate">pip3 install FluViewer\n\n<ol start="3" dir="auto">\n
  • Download and unzip the default FluViewer DB (FluViewer_db.fa.gz) from this repository. Custom DBs can be created and used as well (instructions below).
  • \n\n<h2 tabindex="-1" dir="auto"><a id="user-content-usage" class="anchor" aria-hidden="true" tabindex="-1" href="#usage"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z">Usage\n<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content="FluViewer -f <path_to_fwd_reads> -r <path_to_rev_reads> -d <path_to_db_file> -n <output_name> [ <optional_args> ]"><pre class="notranslate">FluViewer -f <path_to_fwd_reads> -r <path_to_rev_reads> -d <path_to_db_file> -n <output_name> [ <optional_args> ]\n\n<p dir="auto">Required arguments:

    \n<p dir="auto">-f : path to FASTQ file containing forward reads (trim sequencing adapters/primer before analysis)

    \n<p dir="auto">-r : path to FASTQ file containing reverse reads (trim sequencing adapters/primer before analysis)

    \n<p dir="auto">-d : path to FASTA file containing FluViewer database (details below)

    \n<p dir="auto">-n : output name (creates directory with this name for output, includes this name in output files, and in consensus sequence headers)

    \n<p dir="auto">Optional arguments:

    \n<p dir="auto">-i : Minimum sequence identity between database reference sequences and contigs (percentage, default = 90, min = 0, max = 100)

    \n<p dir="auto">-l : Minimum length of alignment between database reference sequences and contigs (int, default = 50, min = 32)

    \n<p dir="auto">-D : minimum read depth for base calling (int, default = 20, min = 1)

    \n<p dir="auto">-q : Minimum PHRED score for mapping quality and base quality during variant calling (int, default = 20, min = 0)

    \n<p dir="auto">-v : Variant allele fraction threshold for calling variants (float, default = 0.95, min = 0, max = 1)

    \n<p dir="auto">-V : Variant allele fraction threshold for masking ambiguous variants (float, default = 0.25, min = 0, max = 1

    \n<p dir="auto">-N : Target depth for pre-normalization of reads (int, default = 200, min = 1)

    \n<p dir="auto">-T : Threads used for BLAST alignments (int, default = 1, min = 1)

    \n<p dir="auto">Optional flags:

    \n<p dir="auto">-g : Disable garbage collection and retain intermediate analysis files

    \n<h2 tabindex="-1" dir="auto"><a id="user-content-fluviewer-database" class="anchor" aria-hidden="true" tabindex="-1" href="#fluviewer-database"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z">FluViewer Database\n<p dir="auto">FluViewer requires a curated FASTA file "database" of IAV reference sequences. Headers for these sequences must be formatted and annotated as follows:

    \n<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content=">unique_id|strain_name(strain_subtype)|sequence_segment|sequence_subtype"><pre class="notranslate">>unique_id|strain_name(strain_subtype)|sequence_segment|sequence_subtype\n\n<p dir="auto">Here are some example entries:

    \n<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content=">CY230322|A/Washington/32/2017(H3N2)|PB2|none\nTCAATTATATTCAGCATGGAAAGAATAAAAGAACTACGGAATCTAATGTCGCAGTCTCGCACTCGCGA...\n\n>JX309816|A/Singapore/TT454/2010(H1N1)|HA|H1\nCAAAAGCAACAAAAATGAAGGCAATACTAGTAGTTCTGCTATATACATTTACAACCGCAAATGCAGACA...\n\n>MH669720|A/Iowa/52/2018(H3N2)|NA|N2\nAGGAAAGATGAATCCAAATCAAAAGATAATAACGATTGGCTCTGTTTCTCTCACCATTTCCACAATATG..."><pre class="notranslate">>CY230322|A/Washington/32/2017(H3N2)|PB2|none\nTCAATTATATTCAGCATGGAAAGAATAAAAGAACTACGGAATCTAATGTCGCAGTCTCGCACTCGCGA...\n\n>JX309816|A/Singapore/TT454/2010(H1N1)|HA|H1\nCAAAAGCAACAAAAATGAAGGCAATACTAGTAGTTCTGCTATATACATTTACAACCGCAAATGCAGACA...\n\n>MH669720|A/Iowa/52/2018(H3N2)|NA|N2\nAGGAAAGATGAATCCAAATCAAAAGATAATAACGATTGGCTCTGTTTCTCTCACCATTTCCACAATATG...\n\n<p dir="auto">For HA and NA segments, strain_subtype should reflect the HA and NA subtypes of the isolate (eg H1N1), but sequence_subtype should only indicate the HA or NA subtype of the segment sequence of the entry (eg H1 for an HA sequence or N1 for an NA sequence).

    \n<p dir="auto">For internal segments (i.e. PB2, PB1, PA, NP, M, and NS), strain_subtype should reflect the HA/NA subtypes of the isolate, but 'none' should be entered for sequence_subtype. If strain_subtype is unknown, 'none' should be entered there as well.

    \n<p dir="auto">FluViewer will only accept reference sequences composed entirely of uppercase canonical nucleotides (i.e. A, T, G, and C).

    \n<h2 tabindex="-1" dir="auto"><a id="user-content-fluviewer-output" class="anchor" aria-hidden="true" tabindex="-1" href="#fluviewer-output"><svg class="octicon octicon-link" viewBox="0 0 16 16" version="1.1" width="16" height="16" aria-hidden="true"><path d="m7.775 3.275 1.25-1.25a3.5 3.5 0 1 1 4.95 4.95l-2.5 2.5a3.5 3.5 0 0 1-4.95 0 .751.751 0 0 1 .018-1.042.751.751 0 0 1 1.042-.018 1.998 1.998 0 0 0 2.83 0l2.5-2.5a2.002 2.002 0 0 0-2.83-2.83l-1.25 1.25a.751.751 0 0 1-1.042-.018.751.751 0 0 1-.018-1.042Zm-4.69 9.64a1.998 1.998 0 0 0 2.83 0l1.25-1.25a.751.751 0 0 1 1.042.018.751.751 0 0 1 .018 1.042l-1.25 1.25a3.5 3.5 0 1 1-4.95-4.95l2.5-2.5a3.5 3.5 0 0 1 4.95 0 .751.751 0 0 1-.018 1.042.751.751 0 0 1-1.042.018 1.998 1.998 0 0 0-2.83 0l-2.5 2.5a1.998 1.998 0 0 0 0 2.83Z">FluViewer Output\n<p dir="auto">FluViewer generates four main output files for each library:

    \n<ol dir="auto">\n
  • A FASTA file containing consensus sequences for the IAV genome segments
  • \n
  • A sorted BAM file with reads mapped to the mapping references generated for that library (the mapping reference is also retained)
  • \n
  • A report TSV file describing segment, subtype, and sequencing metrics for each consensus sequence generated
  • \n
  • Depth of coverage plots for each segment
  • \n\n<p dir="auto">Headers in the FASTA file have the following format:

    \n<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clipboard-copy-content=">output_name|segment|subject"><pre class="notranslate">>output_name|segment|subject\n\n<p dir="auto">The report TSV files contain the following columns:

    \n<p dir="auto">seq_name : the name of the consensus sequence described by this row

    \n<p dir="auto">segment : IAV genome segment (PB2, PB1, PA, HA, NP, NA, M, NS)

    \n<p dir="auto">subtype : HA or NA subtype ("none" for internal segments)

    \n<p dir="auto">reads_mapped : the number of sequencing reads mapped to this segment (post-normalization/downsampling)

    \n<p dir="auto">seq_length : the length (in nucleotides) of the consensus sequence generated by FluViewer

    \n<p dir="auto">scaffold_completeness : the number of nucleotide positions in the scaffold that were assembled from the provided reads (post-normalization/downsampling)

    \n<p dir="auto">consensus_completeness : the number of nucleotide positions in the consensus with a succesful base call (e.g. A, T, G, or C)

    \n<p dir="auto">ref_seq_used : the unique ID and strain name of the scaffold's best-matching reference sequence used for filling in missing regions in the scaffold (if the scaffold completeness was 100%, then this is provided pro forma as none of it was used to create the mapping reference)

    \n<p dir="auto">The depth of coverage plots contains the following elements:

    \n<ul dir="auto">\n
  • A black line indicating the depth of coverage pre-variant calling
  • \n
  • A grey line indicating the depth of coverage post-variant calling
  • \n
  • Red shading covering positions where coverage was too low for base calling
  • \n
  • Orange lines indicating positions where excess variation resulted in an ambiguous base call
  • \n
  • Blue lines indicating positions where a variant was called
  • \n\n","renderedFileInfo":null,"shortPath":null,"tabSize":8,"topBannersInfo":{"overridingGlobalFundingFile":false,"globalPreferredFundingPath":null,"repoOwner":"KevinKuchinski","repoName":"FluViewer","showInvalidCitationWarning":false,"citationHelpUrl":"https://docs.github.com/en/github/creating-cloning-and-archiving-repositories/creating-a-repository-on-github/about-citation-files","showDependabotConfigurationBanner":null,"actionsOnboardingTip":null},"truncated":false,"viewable":true,"workflowRedirectUrl":null,"symbols":{"timedOut":false,"notAnalyzed":true,"symbols":[]}},"copilotInfo":{"documentationUrl":"https://docs.github.com/copilot/overview-of-github-copilot/about-github-copilot-for-individuals","notices":{"codeViewPopover":{"dismissed":false,"dismissPath":"/settings/dismiss-notice/code_view_copilot_popover"}},"userAccess":{"accessAllowed":false,"hasSubscriptionEnded":false,"orgHasCFBAccess":false,"userHasCFIAccess":false,"userHasOrgs":false,"userIsOrgAdmin":false,"userIsOrgMember":false,"business":null,"featureRequestInfo":null}},"csrf_tokens":{"/KevinKuchinski/FluViewer/branches":{"post":"uarEygip6UN-9zXp4cz1LL0e-2aKcPtBNWUk5EpwWJtJzzJR3SVHJ_IDoovT2hParLaK_rFq3dZYcxovOF5Ehg"},"/repos/preferences":{"post":"m0FWtZVCEJrVIqRh0fYBzk4QyAbaQOIVUwoZn6-bhpICw3JJqGv6VIHv6tNEDNm3ka4jpnI6n2VjU1OzvGF3Ig"}}},"title":"FluViewer/README.md at main · KevinKuchinski/FluViewer"}

    Project details


    Download files

    Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

    Source Distribution

    FluViewer-0.1.11.tar.gz (25.3 kB view hashes)

    Uploaded Source

    Built Distribution

    FluViewer-0.1.11-py3-none-any.whl (20.2 kB view hashes)

    Uploaded Python 3

    Supported by

    AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page