Skip to main content

Python bindings for the VCFX toolkit - comprehensive VCF manipulation and analysis

Project description

VCFX Python Package

Python bindings for the VCFX toolkit - a comprehensive VCF manipulation toolkit with 60+ specialized tools for genomic variant analysis.

Installation

pip install vcfx

For development or building from source:

git clone https://github.com/ieeta-pt/VCFX.git
cd VCFX
pip install -e ./python

Quick Start

import vcfx

# Basic helper functions
text = vcfx.trim("  hello  ")  # Returns "hello"
parts = vcfx.split("A,B,C", ",")  # Returns ["A", "B", "C"]

# Read compressed files
content = vcfx.read_file_maybe_compressed("variants.vcf.gz")

# Get version
version = vcfx.get_version()
print(f"VCFX version: {version}")

Tool Wrappers

VCFX provides Python wrappers for all command-line tools. These wrappers execute the tools and parse their output into structured Python objects:

# Count variants
count = vcfx.variant_counter("input.vcf")
print(f"Total variants: {count}")

# Calculate allele frequencies
freqs = vcfx.allele_freq_calc("input.vcf")
for freq in freqs:
    print(f"Position {freq.Pos}: AF={freq.Allele_Frequency}")

# Check sample concordance
concordance = vcfx.concordance_checker("input.vcf", "SAMPLE1", "SAMPLE2")
for row in concordance:
    print(f"{row.Position}: {row.Concordance}")

Structured Data Types

Many tool wrappers return dataclass objects with typed fields for easy access:

from vcfx.results import AlleleFrequency, VariantClassification

# Allele frequency calculations return AlleleFrequency objects
freqs = vcfx.allele_freq_calc("variants.vcf")
# Access fields with type safety and IDE completion
print(freqs[0].Chromosome)  # str
print(freqs[0].Allele_Frequency)  # float

# Variant classification returns VariantClassification objects
classes = vcfx.variant_classifier("variants.vcf")
print(classes[0].Classification)  # 'SNP', 'INDEL', 'MNV', or 'STRUCTURAL'

Common Workflows

Quality Control Pipeline

import vcfx

# Validate VCF format
validation_report = vcfx.validator("input.vcf")
if "ERROR" in validation_report:
    print("VCF validation failed!")
    
# Detect missing data
missing_flagged = vcfx.missing_detector("input.vcf")

# Check concordance across samples
cross_concordance = vcfx.cross_sample_concordance("input.vcf")
discordant = [r for r in cross_concordance if r.Concordance_Status != 'CONCORDANT']

Filtering and Analysis

# Filter by allele frequency
af_filtered = vcfx.af_subsetter("input.vcf", "0.01-0.1")

# Extract specific samples
sample_vcf = vcfx.sample_extractor("input.vcf", ["SAMPLE1", "SAMPLE2"])

# Calculate Hardy-Weinberg equilibrium
hwe_results = vcfx.hwe_tester("input.vcf")
significant = [r for r in hwe_results if r.HWE_pvalue < 0.05]

Population Genetics

# Infer ancestry
ancestry = vcfx.ancestry_inferrer("samples.vcf", "population_freqs.txt")
for sample in ancestry:
    print(f"{sample.Sample}: {sample.Inferred_Population}")

# Calculate inbreeding coefficients
inbreeding = vcfx.inbreeding_calculator("input.vcf", freq_mode="excludeSample")

Error Handling

Tool wrappers raise subprocess.CalledProcessError if the underlying tool fails:

try:
    result = vcfx.variant_counter("nonexistent.vcf")
except subprocess.CalledProcessError as e:
    print(f"Tool failed with exit code {e.returncode}")
    print(f"Error output: {e.stderr}")

Requirements

  • Python 3.10+
  • For tool wrappers: VCFX command-line tools must be installed and available in PATH
    • Install via conda: conda install -c bioconda vcfx
    • Or build from source (see documentation)

Available Tools

VCFX includes 60+ specialized tools organized into categories:

  • Analysis: allele frequencies, variant classification, HWE testing, LD calculation
  • Filtering: quality filters, population filters, missing data filters
  • Transformation: sample extraction, multiallelic splitting, normalization
  • Quality Control: concordance checking, validation, outlier detection
  • File Management: indexing, compression, merging, splitting
  • Annotation: custom annotation, INFO field processing

Use vcfx.available_tools() to list tools accessible in your environment.

Documentation

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcfx-1.1.2.tar.gz (16.5 kB view details)

Uploaded Source

File details

Details for the file vcfx-1.1.2.tar.gz.

File metadata

  • Download URL: vcfx-1.1.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcfx-1.1.2.tar.gz
Algorithm Hash digest
SHA256 1e5e89c6acab65245c7bad1799672f0b12cd79df2ca63f3ab62fbd6502eec142
MD5 21cb8d9c54d2666eb0e705a2e31f109c
BLAKE2b-256 a9e54bebf05f14752b4ade29ffe18924dcc3ea93c231e283771a1b498d9b1ce3

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcfx-1.1.2.tar.gz:

Publisher: publish-pypi.yml on ieeta-pt/VCFX

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page