VICON - Viral Conserved Sequence Extraction Toolkit
Project description
VICON - Viral Sequence Analysis Toolkit
VICON is a Python package for processing and analyzing viral sequence data, with specialized tools for viral genome coverage analysis and sequence alignment.
Features
- Viral sequence alignment and coverage analysis
- K-mer analysis and sliding window coverage calculations
- Visualization tools for coverage plots
- Wrapper scripts for vsearch and ViralMSA
Installation
Option 1: Conda (Recommended)
The easiest way to install VICON with all dependencies:
# Create and activate a new environment
conda create -n vicon python=3.11
conda activate vicon
# Install VICON and all dependencies
conda install -c conda-forge -c bioconda -c eka97 vicon
# Set required permissions
chmod +x "$CONDA_PREFIX/bin/vicon-run"
chmod +x "$CONDA_PREFIX/bin/viralmsa"
chmod +x "$CONDA_PREFIX/bin/minimap2"
Option 2: PyPI (pip)
Install from PyPI:
pip install vicon
Note: When installing via pip, you must manually install these external dependencies:
- minimap2 (≥2.30)
- vsearch
- ViralMSA
Installing External Dependencies
Ubuntu / Debian:
sudo apt-get update
sudo apt-get install -y minimap2 vsearch
macOS (Homebrew):
brew install minimap2 vsearch
ViralMSA:
mkdir -p ~/bin && cd ~/bin
wget "https://raw.githubusercontent.com/niemasd/ViralMSA/master/ViralMSA.py"
chmod +x ViralMSA.py
ln -sf "$PWD/ViralMSA.py" ~/.local/bin/viralmsa
Usage
Run the VICON pipeline with:
vicon-run --config path/to/your/config.yaml
Input FASTA Preprocessing
Note:
VICON automatically preprocesses your input FASTA files (both sample and reference) before analysis:
- Converts all sequences to uppercase
- Cleans and standardizes FASTA headers
- Replaces any non-ATCG characters in sequences with 'N'
You do not need to manually edit or check your FASTA files for these issues.
Example Configuration
Create a configuration file (config.yaml):
project_path: "project_path"
virus_name: "orov"
input_sample: "data/orov/samples/samples.fasta"
input_reference: "data/orov/reference/reference.fasta"
email: "email@address.com"
kmer_size: 150
threshold: 147 # shows a tolerance of 150-147 = 3 degenerations
l_gene_start: 8000
l_gene_end: 16000
coverage_ratio: 0.5
min_year: 2020
threshold_ratio: 0.01
drop_old_samples: false
drop_mischar_samples: true
FASTA Header Year Extraction
The pipeline automatically extracts years from FASTA headers using a two-step approach:
- Priority extraction: Years following separators (
|,_,/,-) - Fallback extraction: Any standalone 4-digit number between 1850-2030
| Header Example | Year Extracted? | Extracted Year | Reason |
|---|---|---|---|
| `>sample | 2021` | ✅ Yes | 2021 |
>sample_2020 |
✅ Yes | 2020 | After underscore separator |
>sample/2019/data |
✅ Yes | 2019 | After slash separator |
>sample-2022-final |
✅ Yes | 2022 | After dash separator |
>data 2021 sequence |
✅ Yes | 2021 | Standalone 4-digit number |
>sample.2020.version |
✅ Yes | 2020 | Standalone 4-digit number |
>test2021extra |
✅ Yes | 2021 | Standalone 4-digit number |
| `>sample | 202` | ❌ No | - |
>sample_1800_old |
❌ No | - | Outside valid range (1850-2030) |
>sample20213long |
❌ No | - | 5 consecutive digits |
Best Practice: Use
|YYYY,_YYYY,/YYYY, or-YYYYpatterns for reliable year extraction.
License
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vicon-1.0.5.tar.gz.
File metadata
- Download URL: vicon-1.0.5.tar.gz
- Upload date:
- Size: 38.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
28a94f094a532e8c3fb79b4ca73ed3f799a1a1c61a5f190f609cde348df24b48
|
|
| MD5 |
457ab485da0dcd3b02110b7ac08f1284
|
|
| BLAKE2b-256 |
a3daf5b45d069a48e37247dc72d54526c4f8a190ac961e4cdd6fad9aea941413
|
File details
Details for the file vicon-1.0.5-py3-none-any.whl.
File metadata
- Download URL: vicon-1.0.5-py3-none-any.whl
- Upload date:
- Size: 51.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ee8225a525d378e1c6c42d1c07ce41b7b9816dd32d866c30ba404dccff6e133
|
|
| MD5 |
35847e6598fb9c1232f0912abf9985ac
|
|
| BLAKE2b-256 |
776bba744a35252b5e43d2084762bbfc92ab32573202b44ede63eef2259383a1
|