Skip to main content

In silico serotyping for Flavobacterium psychrophilum

Project description

FlavoTyper

License: Apache-2.0 PyPI Bioconda Platforms Anaconda-Server Badge

FlavoTyper is a command-line bioinformatics tool that performs in silico serotyping of Flavobacterium psychrophilum genome assemblies.


Introduction

Flavobacteriosis is a bacterial disease with significant impact on the global aquaculture industry, particularly affecting salmonids such as rainbow trout and Atlantic salmon. It causes substantial economic losses in fish farms worldwide.

The causative agent is Flavobacterium psychrophilum, a Gram-negative, rod-shaped psychrotrophic bacterium belonging to the family Flavobacteriaceae of the phylum Bacteroidota.

Phenotypic characterization of this pathogen (including serotyping based on the structural variations in the O-polysaccharide moiety of cell surface lipopolysaccharide) provides critical information for epidemiological surveillance, outbreak investigation, and the design of effective vaccines. FlavoTyper enables this characterization directly from genome assemblies, making serotyping scalable, reproducible, and independent of wet-lab assays.

FlavoTyper is based on previously published data including a multiplex PCR serotyping scheme by Rochat et al., 2017 ( https://doi.org/10.3389/fmicb.2017.01752 ) and the functional characterization of the O-polysaccharide encoding locus in a subset of strains by Cisar et al. 2019 (https://doi.org/10.3389/fmicb.2019.01041).


Installation

Multiple installation options are available depending on the user context and needs. We recommend Bioconda, which installs FlavoTyper and its external tools (BLAST+, fastANI) in a single step. PyPI and from-source installs will require you to install those external tools yourself.

Option 1 — Bioconda (recommended)

It is recommeneded to install a conda package manager, create a separate environment, and activate it before installing and running FlavoTyper. New to conda? Follow the First time with conda? walkthrough for detailed steps.

conda create -n flavotyper -c conda-forge -c bioconda flavotyper
conda activate flavotyper

Option 2 — PyPI

This option requires a working Python installation, then creating a virtual environment and activating it before installing FlavoTyper. First time with Python/pip? See the step-by-step setup guide.

python3 -m venv .venv
source .venv/bin/activate
pip install flavotyper

Option 3 — From source

git clone https://forge.inrae.fr/eric.duchaud/flavotyper.git
cd flavotyper
python3 -m venv .venv
source .venv/bin/activate
pip install .

External dependencies

Required only for the PyPI and from-source installs:

Dependency Minimum version Purpose
BLAST+ (blastn, makeblastdb) 2.12 Marker alignment and locus comparison
fastANI 1.3 Species validation (ANI-based QC)

The simplest way to get them is conda:

conda install -c conda-forge -c bioconda blast fastani

Verify the installation

flavotyper --version
flavotyper data-dir
blastn -version
fastANI --version

Quickstart

  1. Place the genome assembly FASTA file(s) you want to type in one directory.
  2. Run FlavoTyper:
flavotyper type --genomes path/to/genomes/ --outdir results/
  1. View results in the output directory — the main output is results/typing_results.tsv.

Data input

FlavoTyper accepts genome assemblies for F. psychrophilum in FASTA format (.fa, .fna, .fasta, .fas, optionally gzip-compressed). Both single-genome and multi-genome runs are supported:

# Single genome
flavotyper type --genomes genome.fasta --outdir results/

# Multiple genomes from a directory (all supported extensions are discovered automatically)
flavotyper type --genomes genomes/ --outdir results/ --threads 4

Sample identifiers are derived automatically from input filename stems.


Data output

All output files are written to the directory specified with --outdir.

1. Tabular format (TSV)

typing_results.tsv — the main output table, one row per sample.

Key columns include the assigned serotype, call state (Resolved / Partial / Ambiguous / NotTyped), detected markers, QC metrics, typing warnings, and a reference sentence for known serotypes. For the full column reference see Results_Dictionary.md.

2. JSON format

typing_results.jsonl — one complete JSON record per sample (same data as the TSV, machine-readable).

run_metadata.json — run-level provenance: tool version, database name and checksums, parameters, run ID, and timestamp.

input_manifest.json — per-input manifest: source path, file size, and SHA-256 checksum.

Optionally, when a call is "Resolved" and locus analysis is enabled, the tool produces the following outputs:

3. FASTA format

<sample>_locus_sequence.fasta — the O-antigen biosynthesis locus sequence extracted from the input genome, generated when locus analysis is enabled and the call is Resolved.

4. PNG format

<sample>_locus_map.png — a two-track locus map showing the reference locus alongside the aligned sample region, with annotated marker positions.

5. Text format

<sample>_locus_alignment.txt — pairwise BLASTN alignment of the sample genome against the reference locus.

Locus analysis outputs are written to a per-sample subdirectory: <outdir>/<sample>_locus_analysis/.

Typically, the output directory layout is as follows:

results/
├── typing_results.tsv
├── typing_results.jsonl
├── run_metadata.json
├── input_manifest.json
├── sample1_locus_analysis/          # only when --locus-analysis is enabled
│   ├── sample1_locus_map.png
│   ├── sample1_locus_alignment.txt
│   └── sample1_locus_sequence.fasta
└── sample2_locus_analysis/
    ├── sample2_locus_map.png
    ├── sample2_locus_alignment.txt
    └── sample2_locus_sequence.fasta

FlavoTyper Modules

QC module

The purpose of this first module is to ensure that:

  1. The input genome corresponds to the species Flavobacterium psychrophilum.
  2. The genome assembly quality allows a reliable assignment of the serotype.

Samples that fail QC are recorded as NotTyped in the output and skip the typing step.

  1. Species check (enabled by default)

An ANI-based species validation step using fastANI is run before typing. The input genome is compared against the F. psychrophilum type-strain reference genome (NCIMB 1947T). Genomes below the ANI threshold (default: 95 %) are blocked from further typing steps.

This step can be disabled with --no-species-check when species identity has been confirmed independently.

  1. Assembly quality check

Before typing, FlavoTyper evaluates assembly quality.

  • Genome size: flagged if outside the expected interval [2,619,202 – 3,122,663 bp] derived from a curated reference set.
  • Contig count: advisory warning issued above 300 contigs; high-severity warning above 500 contigs.
  • GC percent: calculation of the GC content in the provided genome(s).

The assembly quality check is advisory only, and provides informative warnings to the user about metrics that might affect the reliability of serotype assignment.

Typing module

The core module detects serotype-associated marker genes with BLASTN against the built-in marker database, then applies a declarative rule engine to assign serotype components independently:

  • O-type — assigned from the exclusive detection of one O-antigen marker (wzy gene or presumably wzy).
  • R-type — assigned from base-group marker presence (R1, R2, R3 and R4) and optional inter-marker distance rules for variant confirmation (R1V1, R1V2 and R1V3).
  • S-type — assigned independently from the S1 marker; S0 when absent, S1 when present.

The combined serotype is reported as O:X-Sy-Rz (e.g. O:1-S0-R1V1).

Locus analysis module (optional, --locus-analysis)

When a call is Resolved and the user enabled this module, a second BLASTN is run to align the genome(s) against a full O-antigen biosynthesis locus retrieved from a reference strain. This produces:

  • a pairwise alignment text file,
  • the extracted locus FASTA sequence,
  • a two-track PNG locus map.

Enable with --locus-analysis. Novel serotypes (not yet in the reference locus database) are flagged with a warning in Typing_warnings but are not blocked from receiving a type call.


FlavoTyper Databases

All reference data is integrated in the FlavoTyper package. The built-in data directory can be retrieved with:

flavotyper data-dir

Flavotyper_markers.fasta

This file includes nucleotide sequences for all marker genes used by the typing module. The BLAST database is built from this file at runtime.

A marker is considered present when BLASTN result meets both thresholds: percent identity ≥ 97 % and marker coverage ≥ 94 % (adjustable via --min-identity and --min-coverage).

Markers currently covered:

O-type — each type (O:0–O:7) is detected by a unique wzy gene (wzy0wzy7).

R-type — R0 is the default assignment when no R markers are detected (yet only detected in O:0). R1 variants share a common r1_core marker and are further distinguished by: wfpF (R1V1); Rieske + wfpF_p within a distance of −6 to +6 bp of each other (R1V2); wfpF_pp (R1V3). R2, R3, and R4 are each assigned from a single marker: wfpH, wfpI, and r4_core respectively.

S-type — S1 is assigned when s1_core is detected; S0 when is absent.

Flavotyper_reference_loci.fasta

This file includes full nucleotide sequences of reference O-antigen biosynthesis loci for each known serotype, with embedded metadata (reference strain, genome coordinates, GenBank accession, PMID, and per-marker positions). Used by the locus analysis module.


Command reference

Run flavotyper type --help for the full CLI reference.

Option Default Description
--genomes required One or more genome FASTA files, or a directory (.fa, .fna, .fasta, .fas, optionally .gz — discovered automatically)
--outdir required Output directory
--db built-in Path to the serotyping rules YAML
--species-refs built-in Reference FASTA for fastANI species check
--no-species-check off Disable F. psychrophilum species validation
--ani-threshold 95.0 Minimum ANI to pass the species gate
--min-identity 97.0 Minimum BLASTN percent identity for marker hits
--min-coverage 94.0 Minimum marker coverage (%) for marker hits
--threads 1 Threads passed to BLASTN and fastANI
--locus-analysis off Enable locus comparison and PNG map generation
--locus-db built-in Override the built-in reference-locus FASTA
--allow-duplicate-sample-names off Allow duplicate IDs from filename stems

Interpreting results

Call_state Meaning
Resolved O-type and R-type were both uniquely assigned
Partial One of O or R is Undefined — check Typing_warnings and assembly quality
Ambiguous One of O or R matched multiple valid interpretations — check Alternative_serotypes
NotTyped QC blocked typing — check QC_warnings and species fields

Troubleshooting

For common errors and questions — installation failures, QC warnings, partial or ambiguous calls, locus analysis not running — see Troubleshooting.md.


Citation

If you use FlavoTyper in a publication or report, please cite the software metadata in CITATION.cff.


License

Apache-2.0. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flavotyper-0.5.1.tar.gz (907.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flavotyper-0.5.1-py3-none-any.whl (950.7 kB view details)

Uploaded Python 3

File details

Details for the file flavotyper-0.5.1.tar.gz.

File metadata

  • Download URL: flavotyper-0.5.1.tar.gz
  • Upload date:
  • Size: 907.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for flavotyper-0.5.1.tar.gz
Algorithm Hash digest
SHA256 f7a6df63964af543adb5355228d60b4eee3a130044940bfaf2ef125406a21e51
MD5 dea7532fc144024c25a86ce996a286d1
BLAKE2b-256 4b3d8c2a6f142228e60cf5559950fd31f46979779a90b2d6e9179a5415cb29aa

See more details on using hashes here.

File details

Details for the file flavotyper-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: flavotyper-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 950.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for flavotyper-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9ed3761430be9d1354bc78f1a4b91755ae477b25bbf9ccbdcfa620cf0b342527
MD5 29d1456d30c6324416b42c80a08d555a
BLAKE2b-256 cc7850342ae720540c67dec488e91fbe67de39f4279a165541aedb80df2764f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page