Skip to main content

Comprehensive Variant Analysis & Annotation Suite

Project description

VarAnnote - Comprehensive Variant Analysis & Annotation Suite

🧬 A powerful toolkit for genomic variant annotation and clinical interpretation.

Features

  • Comprehensive Annotation: ClinVar, gnomAD, COSMIC, dbSNP integration
  • Functional Prediction: Gene symbols, consequences, pathogenicity scores
  • Multiple Output Formats: VCF, TSV, JSON
  • Command Line Interface: Easy-to-use CLI with progress bars
  • Modular Design: Each tool can be used independently
  • Academic Ready: Designed for research and publication

Installation

From Source (Development)

git clone https://github.com/yourusername/varannote.git
cd VarAnnote
pip install -e .

From PyPI (Coming Soon)

pip install varannote

Quick Start

Basic Variant Annotation

# Annotate variants with default databases
varannote annotate test_variants.vcf --output annotated.vcf

# Use specific databases
varannote annotate input.vcf -d clinvar -d gnomad --output result.vcf

# Output in different formats
varannote annotate input.vcf --format tsv --output result.tsv
varannote annotate input.vcf --format json --output result.json

Pathogenicity Prediction

# Predict pathogenicity using ensemble model
varannote pathogenicity variants.vcf --model ensemble

# Use specific model with custom threshold
varannote pathogenicity variants.vcf --model cadd --threshold 0.7

Available Commands

varannote --help                    # Show all commands
varannote annotate --help           # Annotation help
varannote pathogenicity --help      # Pathogenicity prediction help
varannote pharmacogenomics --help   # Pharmacogenomics analysis help
varannote population-freq --help    # Population frequency help
varannote compound-het --help       # Compound heterozygote detection help
varannote segregation --help        # Family segregation analysis help

Command Reference

Main Commands

Command Description
annotate Comprehensive variant annotation
pathogenicity Pathogenicity prediction
pharmacogenomics Drug-gene interaction analysis
population-freq Population frequency calculation
compound-het Compound heterozygote detection
segregation Family segregation analysis

Common Options

Option Description
--output, -o Output file path
--format, -f Output format (vcf, tsv, json)
--genome, -g Reference genome (hg19, hg38)
--verbose, -v Enable verbose output

Input/Output Formats

Input

  • VCF files (.vcf, .vcf.gz)
  • Standard VCF format with CHROM, POS, REF, ALT fields

Output

  • VCF: Annotated VCF with INFO fields
  • TSV: Tab-separated values for analysis
  • JSON: Structured data for programmatic use

Annotation Databases

Database Description Fields Added
ClinVar Clinical significance clinvar_significance, clinvar_id
gnomAD Population frequencies gnomad_af, gnomad_ac, gnomad_an
COSMIC Cancer mutations cosmic_id, cosmic_count
dbSNP Variant identifiers dbsnp_id

Examples

Example 1: Basic Annotation

varannote annotate test_variants.vcf --output annotated.vcf --verbose

Output:

🧬 Annotating variants from test_variants.vcf
📊 Using genome: hg38
🗄️  Databases: clinvar, gnomad, dbsnp
🔧 Initialized VariantAnnotator with genome: hg38
📖 Reading variants from test_variants.vcf
🔍 Found 5 variants to annotate
Annotating variants  [####################################]  100%
✅ Annotation complete: 5 variants processed
📁 Output saved to: annotated.vcf

Example 2: TSV Output for Analysis

varannote annotate test_variants.vcf --format tsv --output results.tsv

Example 3: Pathogenicity Analysis

varannote pathogenicity test_variants.vcf --model ensemble --threshold 0.6

Development

Project Structure

VarAnnote/
├── setup.py                    # Package configuration
├── requirements.txt            # Dependencies
├── README.md                   # This file
├── test_variants.vcf          # Test data
└── varannote/
    ├── __init__.py            # Main package
    ├── cli.py                 # Command line interface
    ├── core/                  # Core functionality
    │   ├── annotator.py       # Variant annotation engine
    │   └── pathogenicity.py   # Pathogenicity prediction
    ├── tools/                 # Individual tools
    │   ├── annotator.py       # Annotation tool
    │   └── ...                # Other tools
    └── utils/                 # Utilities
        ├── vcf_parser.py      # VCF file parser
        └── annotation_db.py   # Database interface

Running Tests

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run with coverage
pytest --cov=varannote tests/

Contributing

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Citation

If you use VarAnnote in your research, please cite:

Özsoy, A.U. (2024). VarAnnote: Comprehensive Variant Analysis & Annotation Suite. 
GitHub repository: https://github.com/yourusername/varannote

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contact

Acknowledgments

  • BioPython community for sequence analysis tools
  • gnomAD consortium for population frequency data
  • ClinVar team for clinical variant curation
  • COSMIC database for cancer mutation data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

varannote-0.1.0.tar.gz (48.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

varannote-0.1.0-py3-none-any.whl (63.9 kB view details)

Uploaded Python 3

File details

Details for the file varannote-0.1.0.tar.gz.

File metadata

  • Download URL: varannote-0.1.0.tar.gz
  • Upload date:
  • Size: 48.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for varannote-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5dc5a6d7190a43d65099c95b88f4094f4ff054a1a3a637ac187d02a5a62cd965
MD5 a45ae920b8351b1d2bab80e3b0bc2533
BLAKE2b-256 8c30222676275c378d3273d2c1945359768b0841a063d778e05e3779dbd03ed9

See more details on using hashes here.

File details

Details for the file varannote-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: varannote-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for varannote-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 14ca9e40bb11bd9316702d3d5d86614f1402cd7b846dee04d91f701135aabb62
MD5 6fdab561013e8ef815e47a48ef6cd953
BLAKE2b-256 5e227bc3ef551947c995718cd7b503d39be58920457532dbbb7e6348c18e215e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page