Comprehensive Variant Analysis & Annotation Suite
Project description
VarAnnote - Comprehensive Variant Analysis & Annotation Suite
🧬 A powerful toolkit for genomic variant annotation and clinical interpretation.
Features
- Comprehensive Annotation: ClinVar, gnomAD, COSMIC, dbSNP integration
- Functional Prediction: Gene symbols, consequences, pathogenicity scores
- Multiple Output Formats: VCF, TSV, JSON
- Command Line Interface: Easy-to-use CLI with progress bars
- Modular Design: Each tool can be used independently
- Academic Ready: Designed for research and publication
Installation
From Source (Development)
git clone https://github.com/yourusername/varannote.git
cd VarAnnote
pip install -e .
From PyPI (Coming Soon)
pip install varannote
Installation
Option 1: Install from PyPI (Recommended)
pip install varannote
Option 2: Install from Source
git clone https://github.com/AtaUmutOZSOY/VarAnnote.git
cd VarAnnote
pip install -e .
Windows PATH Configuration
VarAnnote automatically configures PATH on Windows during installation. If you encounter any issues:
- Restart your terminal after installation - this is usually enough
- Alternative: Use python -m (always works):
python -m varannote --help python -m varannote annotate input.vcf --output output.vcf
- Manual setup (if needed):
python -m varannote setup-path
Verify Installation
# Test installation
varannote --version
# or
python -m varannote --version
# Test with help
varannote --help
Quick Start
Basic Variant Annotation
# Annotate variants with default databases
varannote annotate test_variants.vcf --output annotated.vcf
# Use specific databases
varannote annotate input.vcf -d clinvar -d gnomad --output result.vcf
# Output in different formats
varannote annotate input.vcf --format tsv --output result.tsv
varannote annotate input.vcf --format json --output result.json
Pathogenicity Prediction
# Predict pathogenicity using ensemble model
varannote pathogenicity variants.vcf --model ensemble
# Use specific model with custom threshold
varannote pathogenicity variants.vcf --model cadd --threshold 0.7
Available Commands
varannote --help # Show all commands
varannote annotate --help # Annotation help
varannote pathogenicity --help # Pathogenicity prediction help
varannote pharmacogenomics --help # Pharmacogenomics analysis help
varannote population-freq --help # Population frequency help
varannote compound-het --help # Compound heterozygote detection help
varannote segregation --help # Family segregation analysis help
Command Reference
Main Commands
| Command | Description |
|---|---|
annotate |
Comprehensive variant annotation |
pathogenicity |
Pathogenicity prediction |
pharmacogenomics |
Drug-gene interaction analysis |
population-freq |
Population frequency calculation |
compound-het |
Compound heterozygote detection |
segregation |
Family segregation analysis |
Common Options
| Option | Description |
|---|---|
--output, -o |
Output file path |
--format, -f |
Output format (vcf, tsv, json) |
--genome, -g |
Reference genome (hg19, hg38) |
--verbose, -v |
Enable verbose output |
Input/Output Formats
Input
- VCF files (.vcf, .vcf.gz)
- Standard VCF format with CHROM, POS, REF, ALT fields
Output
- VCF: Annotated VCF with INFO fields
- TSV: Tab-separated values for analysis
- JSON: Structured data for programmatic use
Annotation Databases
| Database | Description | Fields Added |
|---|---|---|
| ClinVar | Clinical significance | clinvar_significance, clinvar_id |
| gnomAD | Population frequencies | gnomad_af, gnomad_ac, gnomad_an |
| COSMIC | Cancer mutations | cosmic_id, cosmic_count |
| dbSNP | Variant identifiers | dbsnp_id |
Examples
Example 1: Basic Annotation
varannote annotate test_variants.vcf --output annotated.vcf --verbose
Output:
🧬 Annotating variants from test_variants.vcf
📊 Using genome: hg38
🗄️ Databases: clinvar, gnomad, dbsnp
🔧 Initialized VariantAnnotator with genome: hg38
📖 Reading variants from test_variants.vcf
🔍 Found 5 variants to annotate
Annotating variants [####################################] 100%
✅ Annotation complete: 5 variants processed
📁 Output saved to: annotated.vcf
Example 2: TSV Output for Analysis
varannote annotate test_variants.vcf --format tsv --output results.tsv
Example 3: Pathogenicity Analysis
varannote pathogenicity test_variants.vcf --model ensemble --threshold 0.6
Development
Project Structure
VarAnnote/
├── setup.py # Package configuration
├── requirements.txt # Dependencies
├── README.md # This file
├── test_variants.vcf # Test data
└── varannote/
├── __init__.py # Main package
├── cli.py # Command line interface
├── core/ # Core functionality
│ ├── annotator.py # Variant annotation engine
│ └── pathogenicity.py # Pathogenicity prediction
├── tools/ # Individual tools
│ ├── annotator.py # Annotation tool
│ └── ... # Other tools
└── utils/ # Utilities
├── vcf_parser.py # VCF file parser
└── annotation_db.py # Database interface
Running Tests
# Install development dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/
# Run with coverage
pytest --cov=varannote tests/
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Citation
If you use VarAnnote in your research, please cite:
APA Format:
Özsoy, A. U. (2025). VarAnnote: Comprehensive Variant Analysis & Annotation Suite (Version 1.0.0) [Computer software]. https://doi.org/10.5281/zenodo.15615370
BibTeX:
@software{ozsoy2025varannote,
author = {Özsoy, Ata Umut},
title = {VarAnnote: Comprehensive Variant Analysis \& Annotation Suite},
url = {https://github.com/AtaUmutOZSOY/VarAnnote},
doi = {10.5281/zenodo.15615370},
version = {1.0.0},
year = {2025}
}
IEEE Format:
A. U. Özsoy, "VarAnnote: Comprehensive Variant Analysis & Annotation Suite," Version 1.0.0, 2025, doi: 10.5281/zenodo.15615370. [Online]. Available: https://github.com/AtaUmutOZSOY/VarAnnote
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contact
- Author: Ata Umut ÖZSOY
- Email: ataumut7@gmail.com
- GitHub: https://github.com/AtaUmutOZSOY/VarAnnote
Acknowledgments
- BioPython community for sequence analysis tools
- gnomAD consortium for population frequency data
- ClinVar team for clinical variant curation
- COSMIC database for cancer mutation data
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file varannote-1.0.7.tar.gz.
File metadata
- Download URL: varannote-1.0.7.tar.gz
- Upload date:
- Size: 106.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bff46cb2afc3c6b2ba7f4f7383442ca780768ced505ceee6407c071e8800c683
|
|
| MD5 |
9fc6ca37a3d4c8adb860565248c88b51
|
|
| BLAKE2b-256 |
dffbafc7afc44c68964f79cf96a999790abfe270f500f6d30a25251c917aef90
|
File details
Details for the file varannote-1.0.7-py3-none-any.whl.
File metadata
- Download URL: varannote-1.0.7-py3-none-any.whl
- Upload date:
- Size: 86.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc666770a8170b96a688ab365f7e8ce01611d78c7157ca810deddd8e497e78b3
|
|
| MD5 |
172e6cc5f8675a179ddcc4f59bd55b26
|
|
| BLAKE2b-256 |
8ada4f4750b44a6a53c3f461e9b26649d476e25f436b79e3074f270251bad219
|