Convert VCF files to BedGraph format
Project description
vcf2bedgraph
Convert DeepVariant VCF files to BedGraph format with Variant Allele Frequency (VAF) values. The output is automatically compressed with bgzip and indexed with tabix for efficient querying.
Features
- 🚀 Efficient streaming: Processes large VCF files without loading everything into memory
- 🧬 DeepVariant support: Extracts VAF directly from DeepVariant FORMAT fields
- 📦 Automatic compression: Output is bgzipped and indexed by default
- 🎯 Flexible filtering: Control quality thresholds with command-line parameters
- 📊 BedGraph format: Standard genomics format compatible with UCSC and other tools
Installation
From PyPI
pip install vcf2bedgraph
From source with uv
git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync
Usage
Basic usage
vcf2bedgraph input.vcf.gz -o output.bedgraph
This will:
- Process the VCF file
- Extract VAF for the first sample
- Apply default filters (QUAL >= 20, GQ >= 0, DP >= 10)
- Compress output to
output.bedgraph.gz - Create tabix index
output.bedgraph.gz.tbi - Remove the uncompressed file
Command-line options
vcf2bedgraph [-h] [-o OUTPUT] [--filter-gq FILTER_GQ]
[--filter-dp FILTER_DP] [--filter-qual FILTER_QUAL]
[--no-compress] VCF
Arguments:
VCF: Path to the input VCF file (required)-o, --output OUTPUT: Path to the output BedGraph file (required)--filter-gq FILTER_GQ: Minimum genotype quality threshold (default: 0)--filter-dp FILTER_DP: Minimum depth threshold (default: 10)--filter-qual FILTER_QUAL: Minimum variant quality threshold (default: 20)--no-compress: Skip compression and indexing of the output BedGraph file
Examples
Basic conversion with defaults
vcf2bedgraph sample.vcf.gz -o sample.bedgraph
Output: sample.bedgraph.gz (compressed) and sample.bedgraph.gz.tbi (index)
Custom filters
vcf2bedgraph sample.vcf.gz -o sample.bedgraph \
--filter-qual 30 --filter-gq 20 --filter-dp 15
Skip compression
vcf2bedgraph sample.vcf.gz -o sample.bedgraph --no-compress
Output: sample.bedgraph (uncompressed only)
Query the indexed file
python3 << 'EOF'
import pysam
tbx = pysam.TabixFile('sample.bedgraph.gz')
for record in tbx.fetch('chr1', 1000, 2000):
print(record)
EOF
Output Format
The output is a standard BedGraph file with the following columns:
chr1 15273 15274 0.5625
chr1 15819 15820 0.7317
chr1 47959 47960 1.0000
- Column 1: Chromosome
- Column 2: Start position (0-based, converted from VCF 1-based)
- Column 3: End position
- Column 4: VAF (Variant Allele Frequency, 0.0-1.0)
Filtering
The tool applies the following filters to variants:
- FILTER column: Only PASS variants are included
- QUAL: Minimum variant quality (default: 20)
- GQ: Minimum genotype quality (default: 0)
- DP: Minimum depth (default: 10)
Adjust these with command-line options to match your quality requirements.
Publishing to PyPI
Prerequisites
- PyPI Account: Create an account at pypi.org
- GitHub Trusted Publisher: Configure PyPI to trust GitHub Actions
- Go to PyPI project settings
- Add a trusted publisher with:
- Owner:
fcliquet - Repository:
vcf2bedgraph - Workflow name:
publish.yml - Environment name:
pypi
- Owner:
Versioning
The version is managed directly in pyproject.toml and read dynamically by the package at runtime using importlib.metadata.
To manage the version, use the uv version command which automatically updates pyproject.toml, uv.lock, and rebuilds the lock file.
Bumping the version
Use the uv version command to bump versions automatically:
# Show current version
uv version
# Bump patch version (0.1.0 -> 0.1.1)
uv version --bump patch
# Bump minor version (0.1.0 -> 0.2.0)
uv version --bump minor
# Bump major version (0.1.0 -> 1.0.0)
uv version --bump major
# Set a specific version
uv version 0.2.0
The uv version command automatically:
- Updates
src/vcf2bedgraph/__about__.py - Updates
pyproject.toml - Re-locks dependencies (
uv.lock)
Creating a release
- Bump the version:
uv version --bump minor
- Commit the changes:
git add src/vcf2bedgraph/__about__.py pyproject.toml uv.lock
git commit -m "chore: bump version to $(uv version --short)"
- Create a git tag:
git tag v$(uv version --short)
git push origin main
git push origin v$(uv version --short)
Publishing workflow
When you push a tag matching v*, GitHub Actions automatically:
- Builds the package (wheel and sdist)
- Tests the build
- Publishes to PyPI using trusted publisher authentication
Monitor progress at: GitHub Actions
Semver versioning guide
- Patch (0.0.X): Bug fixes, minor improvements
- Minor (0.X.0): New features, backward compatible
- Major (X.0.0): Breaking changes
Development
Setup development environment
git clone https://github.com/fcliquet/vcf2bedgraph.git
cd vcf2bedgraph
uv sync
Run the CLI in development mode
uv run vcf2bedgraph --help
uv run vcf2bedgraph tests/B00EYQO.vcf.gz -o output.bedgraph
Dependencies
- cyvcf2: High-performance VCF parsing
- pysam: Compression and indexing with htslib
License
MIT License - see LICENSE file for details
Author
Created by fcliquet
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vcf2bedgraph-0.1.5.tar.gz.
File metadata
- Download URL: vcf2bedgraph-0.1.5.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f30f6f8f74c86fa0c642031a6409fd665e3f2cf05a862f7a9dfb7316ad72adb7
|
|
| MD5 |
04dc90250dfa4b8eebede606174e0ba8
|
|
| BLAKE2b-256 |
c18bb4cccdb33e620f7a18488d34355191035f33a538dc1a98344d2436aaffb5
|
Provenance
The following attestation bundles were made for vcf2bedgraph-0.1.5.tar.gz:
Publisher:
publish.yml on fcliquet/vcf2bedgraph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcf2bedgraph-0.1.5.tar.gz -
Subject digest:
f30f6f8f74c86fa0c642031a6409fd665e3f2cf05a862f7a9dfb7316ad72adb7 - Sigstore transparency entry: 667281789
- Sigstore integration time:
-
Permalink:
fcliquet/vcf2bedgraph@dbdc460b9a64a2095d2872ef021687b0d8a116e3 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/fcliquet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dbdc460b9a64a2095d2872ef021687b0d8a116e3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vcf2bedgraph-0.1.5-py3-none-any.whl.
File metadata
- Download URL: vcf2bedgraph-0.1.5-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b11f632edf77a2d39d8e3c7935f601f1d268a2dd103120d3bc336c0838cd294
|
|
| MD5 |
d96368079ab612471189b2eeab74edc1
|
|
| BLAKE2b-256 |
b3b3d59c4b76ce6b9006ebb6259acc0f2fee4baf51ad4333393d374327fb0b37
|
Provenance
The following attestation bundles were made for vcf2bedgraph-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on fcliquet/vcf2bedgraph
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vcf2bedgraph-0.1.5-py3-none-any.whl -
Subject digest:
3b11f632edf77a2d39d8e3c7935f601f1d268a2dd103120d3bc336c0838cd294 - Sigstore transparency entry: 667281790
- Sigstore integration time:
-
Permalink:
fcliquet/vcf2bedgraph@dbdc460b9a64a2095d2872ef021687b0d8a116e3 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/fcliquet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@dbdc460b9a64a2095d2872ef021687b0d8a116e3 -
Trigger Event:
push
-
Statement type: