Tool for the integration of viral consensus sequences obtained by de novo and mapping strategies, supported by prior information.

Project description

PriorCons

Prior‑guided consensus integration for viral genomes

🧭 Introduction

PriorCons improves viral consensus sequences by safely recovering missing information while preserving reliability.

The software integrates:

A high‑confidence consensus sequence (FASTA) generated using a stringent pipeline. This sequence is trusted but may contain masked regions (Ns).
The reference genome used during assembly.
A candidate consensus sequence that is less conservative but potentially more informative (for example, produced with relaxed filtering or alternative assembly).

The objective is to fill gaps in the high‑confidence consensus using information from the candidate sequence — but only when supported by evolutionary evidence — so that coverage increases without introducing sequencing artefacts.

To achieve this, PriorCons uses evolutionary priors derived from large collections of genomes for the same virus or subtype aligned to the reference. These priors model expected variation and provide statistical thresholds that guide integration decisions.

📦 Installation

PriorCons can be installed via Conda (recommended for bioinformatics) or PyPI:

Using Conda

conda install -c bioconda priorcons

View on Bioconda

Using Pip

pip install priorcons

View on PyPI

⚡ Quickstart + CLI Examples

Follow these steps to generate an integrated consensus using PriorCons.

1. Prepare the Priors Database

You need a collection of viral sequences (e.g., from GISAID or NCBI) relevant to your sample.

Alignment is critical: Use MAFFT in reference-anchored mode (e.g. --add --keeplength) to keep coordinates consistent when building priors.
Include the Reference: Ensure your reference sequence is included in this FASTA file.

2. Build the Priors

Run the build-priors command to create the empirical distribution of variation.

priorcons build-priors --input database_aligned.fasta --output virus_priors.json

3. Run integrate-consensus

Once you have the priors, align your three sequences (Trusted, Candidate, and Reference) and run the integration.

Alignment Recommendation: Since you are only aligning 3 sequences, use a high-sensitivity strategy. We recommend MAFFT with the following parameters:

mafft --localpair --maxiterate 1000 input.fasta > aligned_input.fasta

Running the integration:

priorcons integrate-consensus \
    --aligned-fasta aligned_input.fasta \
    --priors virus_priors.json \
    --output integrated_consensus.fasta

🔬 Workflow Overview

PriorCons uses a window-based approach to statistically validate and fill gaps in viral assemblies.

Slide overlapping windows across the genome.
Detect windows with missing regions (Ns) in the trusted consensus.
Evaluate the corresponding candidate window using the priors.
Accept candidate window only if the score is evolutionarily plausible (below the statistical threshold).
Produce an integrated consensus with increased completeness and maintained accuracy.

🧮 Methodology

1. Probability distributions per position

For each window of size $W$ bases, and each position $j$:

$$P_j(b)=\frac{c_j(b)+\alpha}{\sum_{x\in{A,C,G,T}}(c_j(x)+\alpha)}$$

Where:

$c_j(b)$ is the count of base $b$.
$\alpha$ is a pseudocount.
Bases N are ignored.

2. Log‑likelihood of a sequence

Given a sequence $Q$:

$$\log L(Q \mid \text{window}) = \sum_j \log P_j(q_j)$$

Normalized negative log‑likelihood:

$$\text{nLL}(Q) = -\frac{1}{N_{\text{valid}}} \sum_j \log P_j(q_j)$$

Lower values indicate sequences consistent with expected variation.

3. Empirical thresholds

All sequences are scored to obtain an nLL distribution. The 95th percentile is used as a cutoff: windows exceeding this threshold are considered atypical and rejected during integration.

📊 Outputs

Integrated consensus FASTA: The final integrated sequence.
Window‑level QC trace: A file containing scores for each window.
Summary QC metrics: Summary metrics regarding coverage and changes performed.

📚 Citing

This software was developed by Germán Vallejo Palma at the Instituto de Salud Carlos III (ISCIII) — National Centre of Microbiology, Respiratory Viruses and Influenza Unit.

If you use this software in a publication, report, or product, please cite the appropriate authors and include the above attribution.

Project details

Release history Release notifications | RSS feed

This version

0.1.4

Feb 25, 2026

0.1.3

Feb 23, 2026

0.1.2

Jan 23, 2026

0.1.1

Dec 4, 2025

0.1.0

Oct 23, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

priorcons-0.1.4.tar.gz (18.5 kB view details)

Uploaded Feb 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

priorcons-0.1.4-py3-none-any.whl (18.8 kB view details)

Uploaded Feb 25, 2026 Python 3

File details

Details for the file priorcons-0.1.4.tar.gz.

File metadata

Download URL: priorcons-0.1.4.tar.gz
Upload date: Feb 25, 2026
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for priorcons-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`c75580bb09ab44a35fd8bf634c24f59731624aca41b69f4eda5e49e54d56f27a`
MD5	`ee897da9b2561793d1a9ba2b805b104c`
BLAKE2b-256	`fd68da75b47bcf83a5128a0ff4f3adee3293219d99540bbef1566de6a0931635`

See more details on using hashes here.

File details

Details for the file priorcons-0.1.4-py3-none-any.whl.

File metadata

Download URL: priorcons-0.1.4-py3-none-any.whl
Upload date: Feb 25, 2026
Size: 18.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for priorcons-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cd936693b792ec6b2bbf846e2d939bd753e8f2bc33e33bf7c81af1f2db91885a`
MD5	`29bb8ff7578175427740d83722c25303`
BLAKE2b-256	`4c7f0c571e8cc97f3a7c54f95c770958578f690ce8eaa7a1fed71731ede43767`

See more details on using hashes here.

priorcons 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

PriorCons

🧭 Introduction

📦 Installation

Using Conda

Using Pip

⚡ Quickstart + CLI Examples

1. Prepare the Priors Database

2. Build the Priors

3. Run integrate-consensus

🔬 Workflow Overview

🧮 Methodology

1. Probability distributions per position

2. Log‑likelihood of a sequence

3. Empirical thresholds

📊 Outputs

📚 Citing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes