Skip to main content

Convert VCF files to BedGraph format

Project description

vcf2bedgraph

Convert DeepVariant VCF files to BedGraph format with Variant Allele Frequency (VAF) values. The output is automatically compressed with bgzip and indexed with tabix for efficient querying.

Features

  • 🚀 Efficient streaming: Processes large VCF files without loading everything into memory
  • 🧬 DeepVariant support: Extracts VAF directly from DeepVariant FORMAT fields
  • 📦 Automatic compression: Output is bgzipped and indexed by default
  • 🎯 Flexible filtering: Control quality thresholds with command-line parameters
  • 📊 BedGraph format: Standard genomics format compatible with UCSC and other tools

Installation

From PyPI

pip install vcf2bedgraph

From source with uv

git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync

Usage

Basic usage

vcf2bedgraph input.vcf.gz -o output.bedgraph

This will:

  1. Process the VCF file
  2. Extract VAF for the first sample
  3. Apply default filters (QUAL >= 20, GQ >= 0, DP >= 10)
  4. Compress output to output.bedgraph.gz
  5. Create tabix index output.bedgraph.gz.tbi
  6. Remove the uncompressed file

Command-line options

vcf2bedgraph [-h] [-o OUTPUT] [--filter-gq FILTER_GQ]
             [--filter-dp FILTER_DP] [--filter-qual FILTER_QUAL]
             [--no-compress] VCF

Arguments:

  • VCF: Path to the input VCF file (required)
  • -o, --output OUTPUT: Path to the output BedGraph file (required)
  • --filter-gq FILTER_GQ: Minimum genotype quality threshold (default: 0)
  • --filter-dp FILTER_DP: Minimum depth threshold (default: 10)
  • --filter-qual FILTER_QUAL: Minimum variant quality threshold (default: 20)
  • --no-compress: Skip compression and indexing of the output BedGraph file

Examples

Basic conversion with defaults

vcf2bedgraph sample.vcf.gz -o sample.bedgraph

Output: sample.bedgraph.gz (compressed) and sample.bedgraph.gz.tbi (index)

Custom filters

vcf2bedgraph sample.vcf.gz -o sample.bedgraph \
  --filter-qual 30 --filter-gq 20 --filter-dp 15

Skip compression

vcf2bedgraph sample.vcf.gz -o sample.bedgraph --no-compress

Output: sample.bedgraph (uncompressed only)

Query the indexed file

python3 << 'EOF'
import pysam
tbx = pysam.TabixFile('sample.bedgraph.gz')
for record in tbx.fetch('chr1', 1000, 2000):
    print(record)
EOF

Output Format

The output is a standard BedGraph file with the following columns:

chr1  15273  15274  0.5625
chr1  15819  15820  0.7317
chr1  47959  47960  1.0000
  • Column 1: Chromosome
  • Column 2: Start position (0-based, converted from VCF 1-based)
  • Column 3: End position
  • Column 4: VAF (Variant Allele Frequency, 0.0-1.0)

Filtering

The tool applies the following filters to variants:

  1. FILTER column: Only PASS variants are included
  2. QUAL: Minimum variant quality (default: 20)
  3. GQ: Minimum genotype quality (default: 0)
  4. DP: Minimum depth (default: 10)

Adjust these with command-line options to match your quality requirements.

Publishing to PyPI

Prerequisites

  1. PyPI Account: Create an account at pypi.org
  2. GitHub Trusted Publisher: Configure PyPI to trust GitHub Actions
    • Go to PyPI project settings
    • Add a trusted publisher with:
      • Owner: fcliquet
      • Repository: vcf2bedgraph
      • Workflow name: publish.yml
      • Environment name: pypi

Versioning

The version is managed in src/vcf2bedgraph/__about__.py and automatically read by the build system through pyproject.toml.

Bumping the version

Use the uv version command to bump versions automatically:

# Show current version
uv version

# Bump patch version (0.1.0 -> 0.1.1)
uv version --bump patch

# Bump minor version (0.1.0 -> 0.2.0)
uv version --bump minor

# Bump major version (0.1.0 -> 1.0.0)
uv version --bump major

# Set a specific version
uv version 0.2.0

The uv version command automatically:

  • Updates src/vcf2bedgraph/__about__.py
  • Updates pyproject.toml
  • Re-locks dependencies (uv.lock)

Creating a release

  1. Bump the version:
uv version --bump minor
  1. Commit the changes:
git add src/vcf2bedgraph/__about__.py pyproject.toml uv.lock
git commit -m "chore: bump version to $(uv version --short)"
  1. Create a git tag:
git tag v$(uv version --short)
git push origin main
git push origin v$(uv version --short)

Publishing workflow

When you push a tag matching v*, GitHub Actions automatically:

  1. Builds the package (wheel and sdist)
  2. Tests the build
  3. Publishes to PyPI using trusted publisher authentication

Monitor progress at: GitHub Actions

Semver versioning guide

  • Patch (0.0.X): Bug fixes, minor improvements
  • Minor (0.X.0): New features, backward compatible
  • Major (X.0.0): Breaking changes

Development

Setup development environment

git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync

Run the CLI in development mode

uv run vcf2bedgraph --help
uv run vcf2bedgraph tests/B00EYQO.vcf.gz -o output.bedgraph

Dependencies

  • cyvcf2: High-performance VCF parsing
  • pysam: Compression and indexing with htslib

License

MIT License - see LICENSE file for details

Author

Created by fcliquet

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vcf2bedgraph-0.1.2.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vcf2bedgraph-0.1.2-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file vcf2bedgraph-0.1.2.tar.gz.

File metadata

  • Download URL: vcf2bedgraph-0.1.2.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf2bedgraph-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b5f41e19e7eee88d1a13d84010b0cf6dde389ff3524c84b76a5e7023e1e7dc1c
MD5 2bfab0d35630785713bf6e088163fedf
BLAKE2b-256 b15706abd0d57a1afa3e87e6d35f32252746604f01fac322b34868ffa936a9d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf2bedgraph-0.1.2.tar.gz:

Publisher: publish.yml on fcliquet/vcf2bedgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file vcf2bedgraph-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: vcf2bedgraph-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for vcf2bedgraph-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7dab7576603da2208c8a0f12bf3ca57d7769b724730d8e007f981e7a2ed8939
MD5 516745c93e7dd0ec335a1eed5012aa90
BLAKE2b-256 779aded73cce4e54dd705607bf76e5fbe87d6116e2d42127ea4a286e3e25bddd

See more details on using hashes here.

Provenance

The following attestation bundles were made for vcf2bedgraph-0.1.2-py3-none-any.whl:

Publisher: publish.yml on fcliquet/vcf2bedgraph

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page