Skip to main content

Python implementation of GetBaseCountsMultiSample (gbcms) for calculating base counts in BAM files

Project description

gbcms

Complete orientation-aware counting system for genomic variants

Tests Python 3.10+ Ask DeepWiki

Features

  • 🚀 High Performance: Rust-powered core engine with multi-threading
  • 🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
  • 🧪 WFA + PairHMM Phase 3: Pangenomic fast-path WFA alignment with PairHMM fallback for complex multi-allelic classification
  • 📊 Orientation-Aware: Forward and reverse strand analysis with fragment counting
  • 📏 mFSD (Mutant Fragment Size Distribution): Per-allele cfDNA fragment size profiling with KS test and log-likelihood ratio
  • 🔬 Statistical Analysis: Fisher's exact test for strand bias (read-level and fragment-level)
  • 📁 Flexible I/O: VCF and MAF input/output formats
  • 🎯 Quality Filters: 8 configurable read and quality filtering options with heuristic BAQ
  • 🧬 RNA Mode: Transcriptome-aware counting with strandedness, splice detection, and A-to-I editing
  • 🔗 UMI Support: Molecule-level deduplication with UMI-aware fragment grouping
  • 🔧 Normalize Command: Standalone variant normalization (left-align + REF validation) without counting

Installation

Quick install:

pip install gbcms

From source (requires Rust):

git clone https://github.com/msk-access/gbcms.git
cd gbcms
pip install .

Docker:

docker pull ghcr.io/msk-access/gbcms:X.Y.Z  # Replace X.Y.Z with latest from PyPI

💡 Find the latest version on PyPI or GHCR.

📖 Full documentation: https://msk-access.github.io/gbcms/


Usage

gbcms can be used in two ways:

🔧 Option 1: Standalone CLI (1-10 samples)

Best for: Quick analysis, local processing, direct control

gbcms dna \
    --variants variants.vcf \
    --bam sample1.bam \
    --fasta reference.fa \
    --output-dir results/

Output: results/sample1.vcf

Learn more:


🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Best for: Many samples, HPC clusters (SLURM), reproducible pipelines

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    --mode dna \
    -profile slurm

Features:

  • ✅ Automatic parallelization across samples
  • ✅ SLURM/HPC integration
  • ✅ Container support (Docker/Singularity)
  • ✅ Resume failed runs

Learn more:


Which Should I Use?

Scenario Recommendation
1-10 samples, local machine CLI
10+ samples, HPC cluster Nextflow
Quick ad-hoc analysis CLI
Production pipeline Nextflow
Need auto-parallelization Nextflow
Full manual control CLI

Quick Examples

CLI: DNA Single Sample

gbcms dna \
    --variants variants.vcf \
    --bam tumor.bam \
    --fasta hg19.fa \
    --output-dir results/ \
    --threads 4

CLI: RNA-seq

gbcms rna \
    --variants variants.vcf \
    --bam rna_sample:aligned.bam \
    --fasta hg19.fa \
    --rna-editing-db TABLE1_hg38.txt.gz \
    --output-dir results/

CLI: Normalize Variants

gbcms normalize \
    --variants variants.vcf \
    --fasta hg19.fa \
    --output-dir results/

CLI: Multiple Samples (Sequential)

gbcms dna \
    --variants variants.vcf \
    --bam-list samples.txt \
    --fasta hg19.fa \
    --output-dir results/

Nextflow: Many Samples (Parallel)

# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta hg19.fa \
    --mode dna \
    --outdir results \
    -profile slurm

Documentation

📚 Full Documentation: https://msk-access.github.io/gbcms/

Quick Links:


Contributing

See CONTRIBUTING.md for development guidelines.

To contribute to documentation, see the gh-pages branch.


Citation

If you use gbcms in your research, please cite:

Shah, R. et al. (2026). gbcms: A high-performance orientation-aware genotype counting system for genomic variants. Available at: https://github.com/msk-access/gbcms

BibTeX:

@software{pygbcms,
  author       = {Shah, Ronak and contributors},
  title        = {gbcms: A high-performance orientation-aware genotype counting system for genomic variants},
  year         = {2026},
  url          = {https://github.com/msk-access/gbcms},
  note         = {GitHub repository}
}

License

AGPL-3.0 - see LICENSE for details.


Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbcms-5.0.0.tar.gz (211.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gbcms-5.0.0-cp311-cp311-manylinux_2_34_x86_64.whl (6.9 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

File details

Details for the file gbcms-5.0.0.tar.gz.

File metadata

  • Download URL: gbcms-5.0.0.tar.gz
  • Upload date:
  • Size: 211.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gbcms-5.0.0.tar.gz
Algorithm Hash digest
SHA256 b3bac1f95f84cd758b30cdedee3f855a0dfc1e81e0fdb8354c483d7a0e56f66f
MD5 5058ecba07ea65e061c2af8601cd90f4
BLAKE2b-256 078f66a099bb0091eceb9b0dd561d72bc53bf3a4372133c59af706311e4a0aa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for gbcms-5.0.0.tar.gz:

Publisher: release.yml on msk-access/gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gbcms-5.0.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for gbcms-5.0.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9666fa8e32c48710b04f31d1d4017138cafe1a7943d3c00d4e334834df9c2704
MD5 096b6916bf3c41b949c31a0108a910e7
BLAKE2b-256 1e3692c091fc6fa9683d9fcd2606f50d34417e79162a5b49c52be5079aa6c7b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for gbcms-5.0.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: release.yml on msk-access/gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page