Skip to main content

Convert VCF files to BedGraph format

Project description

vcf2bedgraph

Convert DeepVariant VCF files to BedGraph format with Variant Allele Frequency (VAF) values. The output is automatically compressed with bgzip and indexed with tabix for efficient querying.

Features

  • 🚀 Efficient streaming: Processes large VCF files without loading everything into memory
  • 🧬 DeepVariant support: Extracts VAF directly from DeepVariant FORMAT fields
  • 📦 Automatic compression: Output is bgzipped and indexed by default
  • 🎯 Flexible filtering: Control quality thresholds with command-line parameters
  • 📊 BedGraph format: Standard genomics format compatible with UCSC and other tools

Installation

From PyPI

pip install vcf2bedgraph

From source with uv

git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync

Usage

Basic usage

vcf2bedgraph input.vcf.gz -o output.bedgraph

This will:

  1. Process the VCF file
  2. Extract VAF for the first sample
  3. Apply default filters (QUAL >= 20, GQ >= 0, DP >= 10)
  4. Compress output to output.bedgraph.gz
  5. Create tabix index output.bedgraph.gz.tbi
  6. Remove the uncompressed file

Command-line options

vcf2bedgraph [-h] [-o OUTPUT] [--filter-gq FILTER_GQ]
             [--filter-dp FILTER_DP] [--filter-qual FILTER_QUAL]
             [--no-compress] VCF

Arguments:

  • VCF: Path to the input VCF file (required)
  • -o, --output OUTPUT: Path to the output BedGraph file (required)
  • --filter-gq FILTER_GQ: Minimum genotype quality threshold (default: 0)
  • --filter-dp FILTER_DP: Minimum depth threshold (default: 10)
  • --filter-qual FILTER_QUAL: Minimum variant quality threshold (default: 20)
  • --no-compress: Skip compression and indexing of the output BedGraph file

Examples

Basic conversion with defaults

vcf2bedgraph sample.vcf.gz -o sample.bedgraph

Output: sample.bedgraph.gz (compressed) and sample.bedgraph.gz.tbi (index)

Custom filters

vcf2bedgraph sample.vcf.gz -o sample.bedgraph \
  --filter-qual 30 --filter-gq 20 --filter-dp 15

Skip compression

vcf2bedgraph sample.vcf.gz -o sample.bedgraph --no-compress

Output: sample.bedgraph (uncompressed only)

Query the indexed file

python3 << 'EOF'
import pysam
tbx = pysam.TabixFile('sample.bedgraph.gz')
for record in tbx.fetch('chr1', 1000, 2000):
    print(record)
EOF

Output Format

The output is a standard BedGraph file with the following columns:

chr1  15273  15274  0.5625
chr1  15819  15820  0.7317
chr1  47959  47960  1.0000
  • Column 1: Chromosome
  • Column 2: Start position (0-based, converted from VCF 1-based)
  • Column 3: End position
  • Column 4: VAF (Variant Allele Frequency, 0.0-1.0)

Filtering

The tool applies the following filters to variants:

  1. FILTER column: Only PASS variants are included
  2. QUAL: Minimum variant quality (default: 20)
  3. GQ: Minimum genotype quality (default: 0)
  4. DP: Minimum depth (default: 10)

Adjust these with command-line options to match your quality requirements.

Publishing to PyPI

Prerequisites

  1. PyPI Account: Create an account at pypi.org
  2. GitHub Trusted Publisher: Configure PyPI to trust GitHub Actions
    • Go to PyPI project settings
    • Add a trusted publisher with:
      • Owner: fcliquet
      • Repository: vcf2bedgraph
      • Workflow name: publish.yml
      • Environment name: pypi

Versioning

The version is managed directly in pyproject.toml and read dynamically by the package at runtime using importlib.metadata.

To manage the version, use the uv version command which automatically updates pyproject.toml, uv.lock, and rebuilds the lock file.

Bumping the version

Use the uv version command to bump versions automatically:

# Show current version
uv version

# Bump patch version (0.1.0 -> 0.1.1)
uv version --bump patch

# Bump minor version (0.1.0 -> 0.2.0)
uv version --bump minor

# Bump major version (0.1.0 -> 1.0.0)
uv version --bump major

# Set a specific version
uv version 0.2.0

The uv version command automatically:

  • Updates src/vcf2bedgraph/__about__.py
  • Updates pyproject.toml
  • Re-locks dependencies (uv.lock)

Creating a release

  1. Bump the version:
uv version --bump minor
  1. Commit the changes:
git add src/vcf2bedgraph/__about__.py pyproject.toml uv.lock
git commit -m "chore: bump version to $(uv version --short)"
  1. Create a git tag:
git tag v$(uv version --short)
git push origin main
git push origin v$(uv version --short)

Publishing workflow

When you push a tag matching v*, GitHub Actions automatically:

  1. Builds the package (wheel and sdist)
  2. Tests the build
  3. Publishes to PyPI using trusted publisher authentication

Monitor progress at: GitHub Actions

Semver versioning guide

  • Patch (0.0.X): Bug fixes, minor improvements
  • Minor (0.X.0): New features, backward compatible
  • Major (X.0.0): Breaking changes

Development

Setup development environment

git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync

Run the CLI in development mode

uv run vcf2bedgraph --help
uv run vcf2bedgraph tests/B00EYQO.vcf.gz -o output.bedgraph

Dependencies

  • cyvcf2: High-performance VCF parsing
  • pysam: Compression and indexing with htslib

License

MIT License - see LICENSE file for details

Author

Created by fcliquet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf2bedgraph-0.1.5.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcf2bedgraph-0.1.5-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file vcf2bedgraph-0.1.5.tar.gz.

File metadata

  • Download URL: vcf2bedgraph-0.1.5.tar.gz
  • Upload date:
  • Size: 37.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf2bedgraph-0.1.5.tar.gz
Algorithm Hash digest
SHA256 f30f6f8f74c86fa0c642031a6409fd665e3f2cf05a862f7a9dfb7316ad72adb7
MD5 04dc90250dfa4b8eebede606174e0ba8
BLAKE2b-256 c18bb4cccdb33e620f7a18488d34355191035f33a538dc1a98344d2436aaffb5

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf2bedgraph-0.1.5.tar.gz:

Publisher: publish.yml on fcliquet/vcf2bedgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcf2bedgraph-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: vcf2bedgraph-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf2bedgraph-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3b11f632edf77a2d39d8e3c7935f601f1d268a2dd103120d3bc336c0838cd294
MD5 d96368079ab612471189b2eeab74edc1
BLAKE2b-256 b3b3d59c4b76ce6b9006ebb6259acc0f2fee4baf51ad4333393d374327fb0b37

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf2bedgraph-0.1.5-py3-none-any.whl:

Publisher: publish.yml on fcliquet/vcf2bedgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page