In silico serotyping for Flavobacterium psychrophilum
Project description
FlavoTyper
FlavoTyper is a command-line bioinformatics tool that performs in silico serotyping of Flavobacterium psychrophilum genome assemblies.
Introduction
Flavobacteriosis is a bacterial disease with significant impact on the global aquaculture industry, particularly affecting salmonid fish such as rainbow trout and Atlantic salmon. It causes substantial economic losses in fish farms worldwide, manifesting as coldwater disease, rainbow trout fry syndrome, and gill disease depending on the host and stage of life.
The causative agent is Flavobacterium psychrophilum, a Gram-negative, rod-shaped psychrotrophic bacterium belonging to the family Flavobacteriaceae, phylum Bacteroidota.
Phenotypic characterisation of this pathogen (including serotyping based on the O-polysaccharide antigen) provides critical information for epidemiological surveillance, outbreak investigation, and the design of effective vaccines. FlavoTyper enables this characterisation directly from genome assemblies, making serotyping scalable, reproducible, and independent of wet-lab assays.
Installation
Dependencies
FlavoTyper requires Python and two external bioinformatics tools that must be installed manually:
| Dependency | Minimum version | Purpose |
|---|---|---|
BLAST+ (blastn, makeblastdb) |
2.12 | Marker alignment and locus comparison |
| fastANI | 1.3 | Species verification (ANI-based QC) |
Install external tools with conda:
conda install -c bioconda blast fastani
Verify they are available:
blastn -version
fastANI --version
Option 1 : From source
git clone https://forge.inrae.fr/eric.duchaud/flavotyper.git
cd flavotyper
python3 -m venv .venv
source .venv/bin/activate
pip install .
Option 2 : From PyPI
pip install flavotyper
Option 3 : From Bioconda
conda install -c bioconda flavotyper
Verify the installation
flavotyper --version
flavotyper data-dir
Quickstart
- Place the genome assembly FASTA file(s) you want to type in one directory.
- Run FlavoTyper:
flavotyper type --genomes path/to/genomes/ --outdir results/
- View results in the output directory — the main output is
results/typing_results.tsv.
Data input
FlavoTyper accepts genome assemblies for F. psychrophilum in FASTA format (.fa, .fna, .fasta, .fas, optionally gzip-compressed). Both single-genome and multi-genome runs are supported:
# Single genome
flavotyper type --genomes genome.fasta --outdir results/
# Multiple genomes from a directory (all supported extensions are discovered automatically)
flavotyper type --genomes genomes/ --outdir results/ --threads 4
Sample identifiers are derived automatically from input filename stems.
Data output
All output files are written to the directory specified with --outdir.
1. Tabular format (TSV)
typing_results.tsv — the main output table, one row per sample.
Key columns include the assigned serotype, call state (Resolved / Partial / Ambiguous / NotTyped), detected markers, QC metrics, typing warnings, and a reference sentence for known serotypes. For the full column reference see Results_Dictionary.md.
2. JSON format
typing_results.jsonl — one complete JSON record per sample (same data as the TSV, machine-readable).
run_metadata.json — run-level provenance: tool version, database name and checksums, parameters, run ID, and timestamp.
input_manifest.json — per-input manifest: source path, file size, and SHA-256 checksum.
Optionally, when a call is "Resolved" and locus analysis is enabled, the tool produces the following outputs:
3. FASTA format
<sample>_locus_sequence.fasta — the O-antigen biosynthesis locus sequence extracted from the input genome, generated when locus analysis is enabled and the call is Resolved.
4. PNG format
<sample>_locus_map.png — a two-track locus map showing the reference locus alongside the aligned sample region, with annotated marker positions.
5. Text format
<sample>_locus_alignment.txt — pairwise BLASTN alignment of the sample genome against the reference locus.
Locus analysis outputs are written to a per-sample subdirectory: <outdir>/<sample>_locus_analysis/.
Typically, the output directory layout is as follows:
results/
├── typing_results.tsv
├── typing_results.jsonl
├── run_metadata.json
├── input_manifest.json
├── sample1_locus_analysis/ # only when --locus-analysis is enabled
│ ├── sample1_locus_map.png
│ ├── sample1_locus_alignment.txt
│ └── sample1_locus_sequence.fasta
└── sample2_locus_analysis/
├── sample2_locus_map.png
├── sample2_locus_alignment.txt
└── sample2_locus_sequence.fasta
FlavoTyper Modules
QC module
The purpose of this first module is to ensure that:
- The input genome corresponds to the species Flavobacterium psychrophilum.
- The genome assembly quality allows a reliable assignment of serotype.
Samples that fail QC are recorded as NotTyped in the output and skip the typing step.
- Species check (enabled by default)
An ANI-based species verification step using fastANI is run before typing. The input genome is compared against a bundled F. psychrophilum type-strain reference (NCIMB 1947T). Genomes below the ANI threshold (default: 95 %) are blocked from typing.
This step can be disabled with --no-species-check when species identity has been confirmed independently.
- Assembly quality check
Before typing, FlavoTyper evaluates assembly quality.
- Genome size: flagged if outside the expected interval [2,619,202 – 3,122,663 bp] derived from a curated reference set.
- Contig count: advisory warning issued above 300 contigs; high-severity warning above 500 contigs.
- GC percent: calculation of the GC content in the provided genome(s).
The assembly quality check is advisory only, and provides informative warnings to the user about metrics that might affect the reliability of serotype assignment.
Typing module
The core module detects serotype-associated marker genes with BLASTN against the bundled marker database, then applies a declarative rule engine to assign serotype components independently:
- O-type — assigned from the exclusive detection of one O-antigen marker (wzy gene).
- R-type — assigned from base-group marker presence (R1, R2, R3 and R4) and optional inter-marker distance rules for variant confirmation (R1V1, R1V2 and R1V3).
- S-type — assigned independently from the S1 marker; S0 when absent, S1 when the marker is present.
The combined serotype is reported as O:X-Sy-Rz (e.g. O:1-S0-R1V1).
Locus analysis module (optional, --locus-analysis)
When a call is Resolved and the user enabled this module, a second BLASTN is run to align the genome(s) against the full O-antigen biosynthesis locus. This produces:
- a pairwise alignment text file,
- the extracted locus FASTA sequence,
- a publication-ready two-track PNG locus map.
Enable with --locus-analysis. Novel serotypes (not yet in the reference locus database) are flagged with a warning in Typing_warnings but are not blocked from receiving a type call.
FlavoTyper Databases
All reference data is bundled inside the FlavoTyper package. The bundled data directory can be retrieved with:
flavotyper data-dir
Flavotyper_markers.fasta
This file includes nucleotidic sequences for all marker genes used by the typing module. The BLAST database is built from this file at runtime.
A marker is considered present when its BLASTN hit meets both thresholds: percent identity ≥ 97 % and marker coverage ≥ 94 % (adjustable via --min-identity and --min-coverage).
Markers currently covered:
O-type — each type (O:0–O:7) is detected by a unique wzy gene (wzy0–wzy7).
R-type — R0 is the default assignment for O:0 when no R markers are detected. R1 variants share a common r1_core marker and are further distinguished by: wfpF (R1V1); Rieske + wfpF_p within a distance of −6 to +6 bp of each other (R1V2); wfpF_pp (R1V3). R2, R3, and R4 are each assigned from a single marker: wfpH, wfpI, and r4_core respectively.
S-type — S1 is assigned when s1_core is detected; S0 when it is absent.
Flavotyper_reference_loci.fasta
This file includes full nucleotide sequences of reference O-antigen biosynthesis loci for each known serotype, with embedded metadata (reference strain, genome coordinates, GenBank accession, PMID, and per-marker positions). Used by the locus analysis module.
Command reference
Run flavotyper type --help for the full CLI reference.
| Option | Default | Description |
|---|---|---|
--genomes |
required | One or more genome FASTA files, or a directory (.fa, .fna, .fasta, .fas, optionally .gz — discovered automatically) |
--outdir |
required | Output directory |
--db |
bundled | Path to the serotyping rules YAML |
--species-refs |
bundled | Reference FASTA for fastANI species check |
--no-species-check |
off | Disable ANI-based species verification |
--ani-threshold |
95.0 | Minimum ANI to pass the species gate |
--min-identity |
97.0 | Minimum BLASTN percent identity for marker hits |
--min-coverage |
94.0 | Minimum marker coverage (%) for marker hits |
--threads |
1 | Threads passed to BLASTN and fastANI |
--locus-analysis |
off | Enable locus comparison and PNG map generation |
--locus-db |
bundled | Override the bundled reference-locus FASTA |
--allow-duplicate-sample-names |
off | Allow duplicate IDs from filename stems |
Interpreting results
Call_state |
Meaning |
|---|---|
Resolved |
O-type and R-type were both uniquely assigned |
Partial |
One of O or R is Undefined — check Typing_warnings and assembly quality |
Ambiguous |
One of O or R matched multiple valid interpretations — check Alternative_serotypes |
NotTyped |
QC blocked typing — check QC_warnings and species fields |
Troubleshooting
For common errors and questions — installation failures, QC warnings, partial or ambiguous calls, locus analysis not running — see Troubleshooting.md.
Citation
If you use FlavoTyper in a publication or report, please cite the software metadata in CITATION.cff.
License
Apache-2.0. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flavotyper-0.4.0.tar.gz.
File metadata
- Download URL: flavotyper-0.4.0.tar.gz
- Upload date:
- Size: 900.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78c5ad531a9ad60ff865fe39ca144113fe2ed527a63efe045adeb92de2fbdab6
|
|
| MD5 |
2ed6ff5c561f039bd5397367729a5c2c
|
|
| BLAKE2b-256 |
6959b424347220664111ce4a4bf4fca5362e18758249ec659d0e497f0b273724
|
File details
Details for the file flavotyper-0.4.0-py3-none-any.whl.
File metadata
- Download URL: flavotyper-0.4.0-py3-none-any.whl
- Upload date:
- Size: 949.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
872d6813a94c598c10b58384e74c90459585be7a6e09d7db3831f8a8e92f6bdb
|
|
| MD5 |
b30db794fcb27bf18575ee3416036825
|
|
| BLAKE2b-256 |
e2690f347a46c13e43089dd3c4d6c3e2d99fa17433df2341823fd9954200c74c
|