Skip to main content

Extensible in silico serotyping for Flavobacterium psychrophilum

Project description

FlavoTyper

License: Apache-2.0

FlavoTyper is a command-line bioinformatics tool that performs in silico serotyping of Flavobacterium psychrophilum genome assemblies.


Introduction

Flavobacteriosis is a bacterial disease with significant impact on the global aquaculture industry, particularly affecting salmonid fish such as rainbow trout and Atlantic salmon. It causes substantial economic losses in fish farms worldwide, manifesting as coldwater disease, rainbow trout fry syndrome, and gill disease depending on the host and stage of life.

The causative agent is Flavobacterium psychrophilum (Smith 1964), a Gram-negative, rod-shaped psychrotrophic bacterium belonging to the family Flavobacteriaceae, phylum Bacteroidota.

Phenotypic characterisation of this pathogen (including serotyping based on the O-polysaccharide antigen) provides critical information for epidemiological surveillance, outbreak investigation, and the design of effective vaccines. FlavoTyper enables this characterisation directly from genome assemblies, making serotyping scalable, reproducible, and independent of wet-lab assays.


Installation

Dependencies

FlavoTyper requires Python and two external bioinformatics tools that must be installed manually:

Dependency Minimum version Purpose
BLAST+ (blastn, makeblastdb) 2.12 Marker alignment and locus comparison
fastANI 1.3 Species verification (ANI-based QC)

Install external tools with conda:

conda install -c bioconda blast fastani

Verify they are available:

blastn -version
fastANI --version

Option 1 : From source

git clone https://forge.inrae.fr/eric.duchaud/flavotyper.git
cd flavotyper
python3 -m venv .venv
source .venv/bin/activate
pip install .

Option 2 : From PyPI

pip install flavotyper

Option 3 : From Bioconda

conda install -c bioconda flavotyper

Verify the installation

flavotyper --version
flavotyper data-dir

Quickstart

  1. Place the genome assembly FASTA file(s) you want to type in one directory.
  2. Run FlavoTyper:
flavotyper type \
  --genomes path/to/genomes/*.fasta \
  --outdir results/
  1. View results in the output directory — the main output is results/typing_results.tsv.

Data input

FlavoTyper accepts genome assemblies for F. psychrophilum in FASTA format (.fa, .fna, .fasta, .fas, optionally gzip-compressed). Both single-genome and multi-genome runs are supported:

# Single genome
flavotyper type --genomes genome.fasta --outdir results/

# Multiple genomes
flavotyper type --genomes genomes/*.fasta --outdir results/ --threads 4

Sample identifiers are derived automatically from input filename stems.


Data output

All output files are written to the directory specified with --outdir.

1. Tabular format (TSV)

typing_results.tsv — the main output table, one row per sample.

Key columns include the assigned serotype, call state (Resolved / Partial / Ambiguous / NotTyped), detected markers, QC metrics, typing warnings, and a reference sentence for known serotypes. For the full column reference see Results_Dictionary.md.

2. JSON format

typing_results.jsonl — one complete JSON record per sample (same data as the TSV, machine-readable).

run_metadata.json — run-level provenance: tool version, database name and checksums, parameters, run ID, and timestamp.

input_manifest.json — per-input manifest: source path, file size, and SHA-256 checksum.

Optionally, when a call is "Resolved" and locus analysis is enabled, the tool produces the following outputs:

3. FASTA format

<sample>_locus_sequence.fasta — the O-antigen biosynthesis locus sequence extracted from the input genome, generated when locus analysis is enabled and the call is Resolved.

4. PNG format

<sample>_locus_map.png — a two-track locus map showing the reference locus alongside the aligned sample region, with annotated marker positions.

5. Text format

<sample>_locus_alignment.txt — pairwise BLASTN alignment of the sample genome against the reference locus.

Locus analysis outputs are written to a per-sample subdirectory: <outdir>/<sample>_locus_analysis/.

Typically, the output directory layout is as follows:

results/
├── typing_results.tsv
├── typing_results.jsonl
├── run_metadata.json
├── input_manifest.json
├── sample1_locus_analysis/          # only when --locus-analysis is enabled
│   ├── sample1_locus_map.png
│   ├── sample1_locus_alignment.txt
│   └── sample1_locus_sequence.fasta
└── sample2_locus_analysis/
    ├── sample2_locus_map.png
    ├── sample2_locus_alignment.txt
    └── sample2_locus_sequence.fasta

FlavoTyper Modules

QC module

The purpose of this first module is to ensure that:

  1. The input genome corresponds to the species Flavobacterium psychrophilum.
  2. The genome assembly quality allows a relaible assignment of serotype.

Samples that fail QC are recorded as NotTyped in the output and skip the typing step.

  1. Species check (enabled by default)

An ANI-based species verification step using fastANI is run before typing. The input genome is compared against a bundled F. psychrophilum type-strain reference (NCIMB 1947T). Genomes below the ANI threshold (default: 95 %) are blocked from typing.

This step can be disabled with --no-species-check when species identity has been confirmed independently.

  1. Assembly quality check

Before typing, FlavoTyper evaluates assembly quality.

  • Genome size: flagged if outside the expected interval [2,619,202 – 3,122,663 bp] derived from a curated reference set.
  • Contig count: advisory warning issued above 300 contigs; high-severity warning above 500 contigs.
  • GC percent: calculation of the GC content in the provided genome(s).

The assembly quality check is advisory only, and provides informative warnings to the user about metrics that might affect the reliability of serotype assignment.

Typing module

The core module detects serotype-associated marker genes with BLASTN against the bundled marker database, then applies a declarative rule engine to assign serotype components independently:

  • O-type — assigned from the exclusive detection of one O-antigen marker (wzy gene).
  • R-type — assigned from base-group marker presence (R1, R2, R3 and R4) and optional inter-marker distance rules for variant confirmation (R1V1, R1V2 and R1V3).
  • S-type — assigned independently from the S1 marker; S0 when absent, S1 when the marker is present.

The combined serotype is reported as O:X-Sy-Rz (e.g. O:1-S0-R1V1).

Locus analysis module (optional, --locus-analysis)

When a call is Resolved and the user enabled this module, a second BLASTN is run to align the genome(s) against the full O-antigen biosynthesis locus. This produces:

  • a pairwise alignment text file,
  • the extracted locus FASTA sequence,
  • a publication-ready two-track PNG locus map.

Enable with --locus-analysis. Novel serotypes (not yet in the reference locus database) are flagged with a warning in Typing_warnings but are not blocked from receiving a type call.


FlavoTyper Databases

All reference data is bundled inside the FlavoTyper package. The bundled data directory can be retrieved with:

flavotyper data-dir

Flavotyper_markers.fasta

This file includes nucleotidic sequences for all marker genes used by the typing module. The BLAST database is built from this file at runtime.

A marker is considered present when its BLASTN hit meets both thresholds: percent identity ≥ 97 % and marker coverage ≥ 94 % (adjustable via --min-identity and --min-coverage).

Markers currently covered:

O-type — each type (O:0–O:7) is detected by a unique wzy gene (wzy0wzy7).

R-type — R0 is the default assignment for O:0 when no R markers are detected. R1 variants share a common r1_core marker and are further distinguished by: wfpF (R1V1); Rieske + wfpF_p within a distance of −6 to +6 bp of each other (R1V2); wfpF_pp (R1V3). R2, R3, and R4 are each assigned from a single marker: wfpH, wfpI, and r4_core respectively.

S-type — S1 is assigned when s1_core is detected; S0 when it is absent.

Flavotyper_reference_loci.fasta

This file includes full nucleotide sequences of reference O-antigen biosynthesis loci for each known serotype, with embedded metadata (reference strain, genome coordinates, GenBank accession, PMID, and per-marker positions). Used by the locus analysis module.


Command reference

Run flavotyper type --help for the full CLI reference.

Option Default Description
--genomes required One or more input genome FASTA files
--outdir required Output directory
--db bundled Path to the serotyping rules YAML
--species-refs bundled Reference FASTA for fastANI species check
--no-species-check off Disable ANI-based species verification
--ani-threshold 95.0 Minimum ANI to pass the species gate
--min-identity 97.0 Minimum BLASTN percent identity for marker hits
--min-coverage 94.0 Minimum marker coverage (%) for marker hits
--threads 1 Threads passed to BLASTN and fastANI
--locus-analysis off Enable locus comparison and PNG map generation
--locus-db bundled Override the bundled reference-locus FASTA
--allow-duplicate-sample-names off Allow duplicate IDs from filename stems

Interpreting results

Call_state Meaning
Resolved O-type and R-type were both uniquely assigned
Partial One of O or R is Undefined — check Typing_warnings and assembly quality
Ambiguous One of O or R matched multiple valid interpretations — check Alternative_serotypes
NotTyped QC blocked typing — check QC_warnings and species fields

Troubleshooting

For common errors and questions — installation failures, QC warnings, partial or ambiguous calls, locus analysis not running — see Troubleshooting.md.


Citation

If you use FlavoTyper in a publication or report, please cite the software metadata in CITATION.cff.


License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flavotyper-0.3.0.tar.gz (900.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flavotyper-0.3.0-py3-none-any.whl (948.7 kB view details)

Uploaded Python 3

File details

Details for the file flavotyper-0.3.0.tar.gz.

File metadata

  • Download URL: flavotyper-0.3.0.tar.gz
  • Upload date:
  • Size: 900.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for flavotyper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9e925f11160abc162d0a76be1736b00095195313615b142312fdfe34b959d8bf
MD5 11c774281cd2e42e6615fd3250d175d7
BLAKE2b-256 15a6c4275c093f256d868d4c69d6ff819dcec84ea17c8525aa4e03da1b47adf0

See more details on using hashes here.

File details

Details for the file flavotyper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: flavotyper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 948.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for flavotyper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5a4954452f7a48f7a5ef4341833db7af8546dda2d3ce08b92f95902ea70b1900
MD5 293b4191452382ac0c5c47fffbdeeed5
BLAKE2b-256 c9cd2ac80c55a470a68f59ff6845cd758565f8ca29aab13b9c644d4ee72b754a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page