A CLI tool for processing and filtering bcftools tabulated TSV files with pedigree support
Project description
PyWombat
A CLI tool for processing bcftools tabulated TSV files.
Installation
This is a UV-managed Python package. To install:
uv sync
Usage
The wombat command processes bcftools tabulated TSV files:
# Format a bcftools TSV file and print to stdout
wombat input.tsv
# Format and save to output file (creates output.tsv by default)
wombat input.tsv -o output
# Format and save as parquet
wombat input.tsv -o output -f parquet
wombat input.tsv -o output --format parquet
# Format with pedigree information to add parent genotypes
wombat input.tsv --pedigree pedigree.tsv -o output
What does wombat do?
The wombat command processes bcftools tabulated TSV files by:
-
Expanding the
(null)column: This column contains multiple fields in the formatNAME=valueseparated by semicolons (e.g.,DP=30;AF=0.5;AC=2). Each field is extracted into its own column. -
Preserving the
CSQcolumn: The CSQ (Consequence) column is preserved as-is and not melted, allowing VEP annotations to remain intact. -
Melting and splitting sample columns: After the
(null)column, there are typically sample columns with values inGT:DP:GQ:ADformat. The tool:- Extracts the sample name (the part before the first
:character) - Transforms the wide format into long format
- Creates a
samplecolumn with the sample names - Splits the sample values into separate columns:
sample_gt: Genotype (e.g., 0/1, 1/1)sample_dp: Read depthsample_gq: Genotype qualitysample_ad: Allele depth (takes the second value from comma-separated list)sample_vaf: Variant allele frequency (calculated as sample_ad / sample_dp)
- Extracts the sample name (the part before the first
Example
Input:
CHROM POS REF ALT (null) Sample1:GT:Sample1:DP:Sample1:GQ:Sample1:AD Sample2:GT:Sample2:DP:Sample2:GQ:Sample2:AD
chr1 100 A T DP=30;AF=0.5;AC=2 0/1:15:99:5,10 1/1:18:99:0,18
Output:
CHROM POS REF ALT AC AF DP sample sample_gt sample_dp sample_gq sample_ad sample_vaf
chr1 100 A T 2 0.5 30 Sample1 0/1 15 99 10 0.6667
chr1 100 A T 2 0.5 30 Sample2 1/1 18 99 18 1.0
Notes:
- The
sample_adcolumn contains the second value from the AD field (e.g., from5,10it extracts10) - The
sample_vafcolumn is the variant allele frequency calculated assample_ad / sample_dp - By default, output is in TSV format. Use
-f parquetto output as Parquet files - The
-ooption specifies an output prefix (e.g.,-o outputcreatesoutput.tsvoroutput.parquet)
Pedigree Support
You can provide a pedigree file with the --pedigree option to add parent genotype information to the output. This enables trio analysis by including the father's and mother's genotypes for each sample.
Pedigree File Format:
The pedigree file should be a tab-separated file with the following columns:
FID: Family IDsample_id: Sample identifier (matches the sample names in the VCF)FatherBarcode: Father's sample identifier (use0or-9if unknown)MotherBarcode: Mother's sample identifier (use0or-9if unknown)Sex: Sex of the sample (optional)Pheno: Phenotype information (optional)
Example pedigree file:
FID sample_id FatherBarcode MotherBarcode Sex Pheno
FAM1 Child1 Father1 Mother1 1 2
FAM1 Father1 0 0 1 1
FAM1 Mother1 0 0 2 1
Output with Pedigree:
When using --pedigree, the output will include additional columns for each parent:
father_gt,father_dp,father_gq,father_ad,father_vaf: Father's genotype informationmother_gt,mother_dp,mother_gq,mother_ad,mother_vaf: Mother's genotype information
These columns will contain the parent's genotype data for the same variant, allowing you to analyze inheritance patterns.
Development
This project uses:
- UV for package management
- Polars for fast data processing
- Click for CLI interface
Testing
Test files are available in the tests/ directory:
test.tabulated.tsv- Real bcftools outputtest_small.tsv- Small example for quick testing
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pywombat-0.4.0.tar.gz.
File metadata
- Download URL: pywombat-0.4.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6c033750df73316f9e5b892e61eecdff6230a989cc50977414a3677e491a5c4f
|
|
| MD5 |
a6d3ff29ef9930d2a4d9b204cfe8df4e
|
|
| BLAKE2b-256 |
6481d9c13f2178beb8905d3c5ea34958ef99b0d425381c06dee205449f6d2442
|
Provenance
The following attestation bundles were made for pywombat-0.4.0.tar.gz:
Publisher:
publish.yml on bourgeron-lab/pywombat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pywombat-0.4.0.tar.gz -
Subject digest:
6c033750df73316f9e5b892e61eecdff6230a989cc50977414a3677e491a5c4f - Sigstore transparency entry: 769205408
- Sigstore integration time:
-
Permalink:
bourgeron-lab/pywombat@d52776162fee5ceb54e4dd89f128287b981a5f52 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/bourgeron-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d52776162fee5ceb54e4dd89f128287b981a5f52 -
Trigger Event:
release
-
Statement type:
File details
Details for the file pywombat-0.4.0-py3-none-any.whl.
File metadata
- Download URL: pywombat-0.4.0-py3-none-any.whl
- Upload date:
- Size: 12.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a6b395f13e277497a9f2aabe2925a98c4471c8766f232c33dc2e7eff2c8c5a3
|
|
| MD5 |
869fffbcd7129d3c9d5e646a18e0b496
|
|
| BLAKE2b-256 |
96831f3fc81e905e605e237c49b565cdcb9a6ae96b7935abe6fce8459ee80ea3
|
Provenance
The following attestation bundles were made for pywombat-0.4.0-py3-none-any.whl:
Publisher:
publish.yml on bourgeron-lab/pywombat
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pywombat-0.4.0-py3-none-any.whl -
Subject digest:
1a6b395f13e277497a9f2aabe2925a98c4471c8766f232c33dc2e7eff2c8c5a3 - Sigstore transparency entry: 769205430
- Sigstore integration time:
-
Permalink:
bourgeron-lab/pywombat@d52776162fee5ceb54e4dd89f128287b981a5f52 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/bourgeron-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d52776162fee5ceb54e4dd89f128287b981a5f52 -
Trigger Event:
release
-
Statement type: