Skip to main content

VICON - Viral Conserved Sequence Extraction Toolkit

Project description

VICON - Viral Sequence Analysis Toolkit

VICON is a Python package for processing and analyzing viral sequence data, with specialized tools for viral genome coverage analysis and sequence alignment.

Features

  • Viral sequence alignment and coverage analysis
  • K-mer analysis and sliding window coverage calculations
  • Visualization tools for coverage plots
  • Wrapper scripts for vsearch and viralmsa

Quick Install (pip)

vicon can be installed directly from PyPI.


1. Install external dependencies

Before installing vicon, make sure you have the following tools installed and available in your PATH:

  • minimap2
  • vsearch
  • ViralMSA

Ubuntu / Debian

sudo apt-get update
sudo apt-get install -y minimap2 vsearch

macOS (Homebrew)

brew install minimap2 vsearch

ViralMSA

ViralMSA can be installed by downloading the script:

mkdir -p ~/bin && cd ~/bin
wget "https://raw.githubusercontent.com/niemasd/ViralMSA/master/ViralMSA.py"
chmod +x ViralMSA.py
ln -sf "$PWD/ViralMSA.py" ~/.local/bin/viralmsa

2. Install vicon from PyPI

python -m pip install --upgrade pip
pip install vicon

Standard Installation

  1. Create and activate a conda environment:

    conda create -n vicon python=3.11
    conda activate vicon
    
  2. Install VICON and its dependencies:

    conda install -c conda-forge -c bioconda -c eka97 vicon
    
  3. Set required permissions:

    chmod +x "$CONDA_PREFIX/bin/vicon-run"
    chmod +x "$CONDA_PREFIX/bin/viralmsa"
    chmod +x "$CONDA_PREFIX/bin/minimap2"
    

Usage

To run the VICON pipeline, use the following command:

vicon-run --config path/to/your/config.yaml

Input FASTA Preprocessing

Note:
When you run the pipeline, VICON will automatically preprocess your input FASTA files (both sample and reference) before any analysis.
This step:

  • Converts all sequences to uppercase
  • Cleans and standardizes FASTA headers
  • Replaces any non-ATCG characters in sequences with 'N'

The cleaned files are used for all downstream analysis, so you do not need to manually edit or check your FASTA files for these issues.

Example Configuration

Here's an example of what your configuration file (config.yaml) should look like:

project_path: "project_path"
virus_name: "orov"
input_sample: "data/orov/samples/samples.fasta"
input_reference: "data/orov/reference/reference.fasta"
email: "email@address.com"
kmer_size: 150
threshold: 147 # shows a tolerance of 150-147 =3 degenerations
l_gene_start: 8000
l_gene_end: 16000
coverage_ratio: 0.5
min_year: 2020
threshold_ratio: 0.01
drop_old_samples: false
drop_mischar_samples: true

FASTA Header Year Extraction: Supported Formats

The pipeline automatically extracts years from FASTA headers using a two-step approach:

  1. Priority extraction: Years following separators (|, _, /, -)
  2. Fallback extraction: Any standalone 4-digit number between 1850-2030
Header Example Year Extracted? Extracted Year Reason
`>sample 2021` ✅ Yes 2021
>sample_2020 ✅ Yes 2020 After underscore separator
>sample/2019/data ✅ Yes 2019 After slash separator
>sample-2022-final ✅ Yes 2022 After dash separator
>data 2021 sequence ✅ Yes 2021 Standalone 4-digit number
>sample.2020.version ✅ Yes 2020 Standalone 4-digit number
>test2021extra ✅ Yes 2021 Standalone 4-digit number
`>sample 202` ❌ No -
>sample_1800_old ❌ No - Outside valid range (1850-2030)
>sample20213long ❌ No - 5 consecutive digits

Algorithm Details:

  • Step 1: Searches for years immediately following separators (|, _, /, -)
  • Step 2: If no separator-based year found, searches for any standalone 4-digit number
  • Validation: All extracted years must be between 1850-2030
  • Word boundaries: Ensures 4-digit numbers are standalone (letter→digit or digit→letter transitions count as word boundaries)

Best Practice: Use |YYYY, _YYYY, /YYYY, or -YYYY patterns for reliable year extraction.

License

This project is licensed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vicon-1.0.4.tar.gz (39.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vicon-1.0.4-py3-none-any.whl (51.9 kB view details)

Uploaded Python 3

File details

Details for the file vicon-1.0.4.tar.gz.

File metadata

  • Download URL: vicon-1.0.4.tar.gz
  • Upload date:
  • Size: 39.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vicon-1.0.4.tar.gz
Algorithm Hash digest
SHA256 358ff1ddd41e9173f00e86ad099c31b766304300c6a4789e1ecbed24d04494d1
MD5 64686aadccfabb21cf60dbe0566d0a8d
BLAKE2b-256 3ca578f4de0c3c0b6085eea2a5820a8e3cc7d6c0c440664e562cd454a91ba125

See more details on using hashes here.

File details

Details for the file vicon-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: vicon-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 51.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for vicon-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 9af8e97f2aff40676f4e17711e0a970d933767615ff9e242394e38d9071a2894
MD5 0055a03089067f31e9cb72313b3604f6
BLAKE2b-256 4714bc0b1047939a2fe33de367e0a452f7046f01287cec63d3554172dad92354

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page