VICON - Viral Conserved Sequence Extraction Toolkit
Project description
VICON - Viral Sequence Analysis Toolkit
VICON is a Python package for processing and analyzing viral sequence data, with specialized tools for viral genome coverage analysis and sequence alignment.
Features
- Viral sequence alignment and coverage analysis
- K-mer analysis and sliding window coverage calculations
- Visualization tools for coverage plots
- Wrapper scripts for vsearch and viralmsa
Quick Install (pip)
vicon can be installed directly from PyPI.
1. Install external dependencies
Before installing vicon, make sure you have the following tools installed and available in your PATH:
- minimap2
- vsearch
- ViralMSA
Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y minimap2 vsearch
macOS (Homebrew)
brew install minimap2 vsearch
ViralMSA
ViralMSA can be installed by downloading the script:
mkdir -p ~/bin && cd ~/bin
wget "https://raw.githubusercontent.com/niemasd/ViralMSA/master/ViralMSA.py"
chmod +x ViralMSA.py
ln -sf "$PWD/ViralMSA.py" ~/.local/bin/viralmsa
2. Install vicon from PyPI
python -m pip install --upgrade pip
pip install vicon
Standard Installation
-
Create and activate a conda environment:
conda create -n vicon python=3.11 conda activate vicon
-
Install VICON and its dependencies:
conda install -c conda-forge -c bioconda -c eka97 vicon
-
Set required permissions:
chmod +x "$CONDA_PREFIX/bin/vicon-run" chmod +x "$CONDA_PREFIX/bin/viralmsa" chmod +x "$CONDA_PREFIX/bin/minimap2"
Usage
To run the VICON pipeline, use the following command:
vicon-run --config path/to/your/config.yaml
Input FASTA Preprocessing
Note:
When you run the pipeline, VICON will automatically preprocess your input FASTA files (both sample and reference) before any analysis.
This step:
- Converts all sequences to uppercase
- Cleans and standardizes FASTA headers
- Replaces any non-ATCG characters in sequences with 'N'
The cleaned files are used for all downstream analysis, so you do not need to manually edit or check your FASTA files for these issues.
Example Configuration
Here's an example of what your configuration file (config.yaml) should look like:
project_path: "project_path"
virus_name: "orov"
input_sample: "data/orov/samples/samples.fasta"
input_reference: "data/orov/reference/reference.fasta"
email: "email@address.com"
kmer_size: 150
threshold: 147 # shows a tolerance of 150-147 =3 degenerations
l_gene_start: 8000
l_gene_end: 16000
coverage_ratio: 0.5
min_year: 2020
threshold_ratio: 0.01
drop_old_samples: false
drop_mischar_samples: true
FASTA Header Year Extraction: Supported Formats
The pipeline automatically extracts years from FASTA headers using a two-step approach:
- Priority extraction: Years following separators (
|,_,/,-) - Fallback extraction: Any standalone 4-digit number between 1850-2030
| Header Example | Year Extracted? | Extracted Year | Reason |
|---|---|---|---|
| `>sample | 2021` | ✅ Yes | 2021 |
>sample_2020 |
✅ Yes | 2020 | After underscore separator |
>sample/2019/data |
✅ Yes | 2019 | After slash separator |
>sample-2022-final |
✅ Yes | 2022 | After dash separator |
>data 2021 sequence |
✅ Yes | 2021 | Standalone 4-digit number |
>sample.2020.version |
✅ Yes | 2020 | Standalone 4-digit number |
>test2021extra |
✅ Yes | 2021 | Standalone 4-digit number |
| `>sample | 202` | ❌ No | - |
>sample_1800_old |
❌ No | - | Outside valid range (1850-2030) |
>sample20213long |
❌ No | - | 5 consecutive digits |
Algorithm Details:
- Step 1: Searches for years immediately following separators (
|,_,/,-) - Step 2: If no separator-based year found, searches for any standalone 4-digit number
- Validation: All extracted years must be between 1850-2030
- Word boundaries: Ensures 4-digit numbers are standalone (letter→digit or digit→letter transitions count as word boundaries)
Best Practice: Use
|YYYY,_YYYY,/YYYY, or-YYYYpatterns for reliable year extraction.
License
This project is licensed under the terms of the MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vicon-1.0.4.tar.gz.
File metadata
- Download URL: vicon-1.0.4.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
358ff1ddd41e9173f00e86ad099c31b766304300c6a4789e1ecbed24d04494d1
|
|
| MD5 |
64686aadccfabb21cf60dbe0566d0a8d
|
|
| BLAKE2b-256 |
3ca578f4de0c3c0b6085eea2a5820a8e3cc7d6c0c440664e562cd454a91ba125
|
File details
Details for the file vicon-1.0.4-py3-none-any.whl.
File metadata
- Download URL: vicon-1.0.4-py3-none-any.whl
- Upload date:
- Size: 51.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9af8e97f2aff40676f4e17711e0a970d933767615ff9e242394e38d9071a2894
|
|
| MD5 |
0055a03089067f31e9cb72313b3604f6
|
|
| BLAKE2b-256 |
4714bc0b1047939a2fe33de367e0a452f7046f01287cec63d3554172dad92354
|