Python implementation of GetBaseCountsMultiSample (gbcms) for calculating base counts in BAM files
Project description
py-gbcms
Complete orientation-aware counting system for genomic variants
Features
- 🚀 High Performance: Rust-powered core engine with multi-threading
- 🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
- 📊 Orientation-Aware: Forward and reverse strand analysis with fragment counting
- 🔬 Statistical Analysis: Fisher's exact test for strand bias
- 📁 Flexible I/O: VCF and MAF input/output formats
- 🎯 Quality Filters: 7 configurable read filtering options
Installation
Quick install:
pip install py-gbcms
From source (requires Rust):
git clone https://github.com/msk-access/py-gbcms.git
cd py-gbcms
pip install .
Docker:
docker pull ghcr.io/msk-access/py-gbcms:2.1.0
📖 Full documentation: https://msk-access.github.io/py-gbcms/
Usage
py-gbcms can be used in two ways:
🔧 Option 1: Standalone CLI (1-10 samples)
Best for: Quick analysis, local processing, direct control
gbcms run \
--variants variants.vcf \
--bam sample1.bam \
--fasta reference.fa \
--output-dir results/
Output: results/sample1.vcf
Learn more:
🔄 Option 2: Nextflow Workflow (10+ samples, HPC)
Best for: Many samples, HPC clusters (SLURM), reproducible pipelines
nextflow run nextflow/main.nf \
--input samplesheet.csv \
--variants variants.vcf \
--fasta reference.fa \
-profile slurm
Features:
- ✅ Automatic parallelization across samples
- ✅ SLURM/HPC integration
- ✅ Container support (Docker/Singularity)
- ✅ Resume failed runs
Learn more:
Which Should I Use?
| Scenario | Recommendation |
|---|---|
| 1-10 samples, local machine | CLI |
| 10+ samples, HPC cluster | Nextflow |
| Quick ad-hoc analysis | CLI |
| Production pipeline | Nextflow |
| Need auto-parallelization | Nextflow |
| Full manual control | CLI |
Quick Examples
CLI: Single Sample
gbcms run \
--variants variants.vcf \
--bam tumor.bam \
--fasta hg19.fa \
--output-dir results/ \
--threads 4
CLI: Multiple Samples (Sequential)
gbcms run \
--variants variants.vcf \
--bam-list samples.txt \
--fasta hg19.fa \
--output-dir results/
Nextflow: Many Samples (Parallel)
# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,
nextflow run nextflow/main.nf \
--input samplesheet.csv \
--variants variants.vcf \
--fasta hg19.fa \
--outdir results \
-profile slurm
Documentation
📚 Full Documentation: https://cmo-ci.gitbook.io/py-gbcms/
Quick Links:
Contributing
See CONTRIBUTING.md for development guidelines.
To contribute to documentation, see the gh-pages branch.
Citation
If you use py-gbcms in your research, please cite:
[Citation to be added]
License
AGPL-3.0 - see LICENSE for details.
Support
- 🐛 Issues: https://github.com/msk-access/py-gbcms/issues
- 💬 Discussions: https://github.com/msk-access/py-gbcms/discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file py_gbcms-2.7.0.tar.gz.
File metadata
- Download URL: py_gbcms-2.7.0.tar.gz
- Upload date:
- Size: 79.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64052ea6e7e1242b7761b54c1495482e81aa2470b2ec9c689d3a8ec84f5f5263
|
|
| MD5 |
d1742504aa7d8ce465a716512251571f
|
|
| BLAKE2b-256 |
43a43fadbc0cd464fb29d3ff8114711682418f4c3cc8f0323a179520d34a1d28
|
Provenance
The following attestation bundles were made for py_gbcms-2.7.0.tar.gz:
Publisher:
release.yml on msk-access/py-gbcms
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_gbcms-2.7.0.tar.gz -
Subject digest:
64052ea6e7e1242b7761b54c1495482e81aa2470b2ec9c689d3a8ec84f5f5263 - Sigstore transparency entry: 971661612
- Sigstore integration time:
-
Permalink:
msk-access/py-gbcms@37ec41ec97822c7eecdf05657ed0c1e128c23511 -
Branch / Tag:
refs/tags/2.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@37ec41ec97822c7eecdf05657ed0c1e128c23511 -
Trigger Event:
push
-
Statement type:
File details
Details for the file py_gbcms-2.7.0-cp39-cp39-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: py_gbcms-2.7.0-cp39-cp39-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.9, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f5920a18fa97e2f976de49850ee8d0b0b149d4bc320799f2cc05a7a9c61055c
|
|
| MD5 |
f12cb4ce2578e225e663962b53d8ffc3
|
|
| BLAKE2b-256 |
96f7fd1b59e9913d26bde530f92f6e52f3e16387185e0e54976039bcf2c58423
|
Provenance
The following attestation bundles were made for py_gbcms-2.7.0-cp39-cp39-manylinux_2_34_x86_64.whl:
Publisher:
release.yml on msk-access/py-gbcms
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
py_gbcms-2.7.0-cp39-cp39-manylinux_2_34_x86_64.whl -
Subject digest:
1f5920a18fa97e2f976de49850ee8d0b0b149d4bc320799f2cc05a7a9c61055c - Sigstore transparency entry: 971661613
- Sigstore integration time:
-
Permalink:
msk-access/py-gbcms@37ec41ec97822c7eecdf05657ed0c1e128c23511 -
Branch / Tag:
refs/tags/2.7.0 - Owner: https://github.com/msk-access
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@37ec41ec97822c7eecdf05657ed0c1e128c23511 -
Trigger Event:
push
-
Statement type: