Skip to main content

Python implementation of GetBaseCountsMultiSample (gbcms) for calculating base counts in BAM files

Project description

py-gbcms

Complete orientation-aware counting system for genomic variants

Tests Python 3.10+

Features

  • 🚀 High Performance: Rust-powered core engine with multi-threading
  • 🧬 Complete Variant Support: SNP, MNP, insertion, deletion, and complex variants (DelIns, SNP+Indel)
  • 📊 Orientation-Aware: Forward and reverse strand analysis with fragment counting
  • 🔬 Statistical Analysis: Fisher's exact test for strand bias
  • 📁 Flexible I/O: VCF and MAF input/output formats
  • 🎯 Quality Filters: 7 configurable read filtering options

Installation

Quick install:

pip install py-gbcms

From source (requires Rust):

git clone https://github.com/msk-access/py-gbcms.git
cd py-gbcms
pip install .

Docker:

docker pull ghcr.io/msk-access/py-gbcms:2.1.0

📖 Full documentation: https://msk-access.github.io/py-gbcms/


Usage

py-gbcms can be used in two ways:

🔧 Option 1: Standalone CLI (1-10 samples)

Best for: Quick analysis, local processing, direct control

gbcms run \
    --variants variants.vcf \
    --bam sample1.bam \
    --fasta reference.fa \
    --output-dir results/

Output: results/sample1.vcf

Learn more:


🔄 Option 2: Nextflow Workflow (10+ samples, HPC)

Best for: Many samples, HPC clusters (SLURM), reproducible pipelines

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    -profile slurm

Features:

  • ✅ Automatic parallelization across samples
  • ✅ SLURM/HPC integration
  • ✅ Container support (Docker/Singularity)
  • ✅ Resume failed runs

Learn more:


Which Should I Use?

Scenario Recommendation
1-10 samples, local machine CLI
10+ samples, HPC cluster Nextflow
Quick ad-hoc analysis CLI
Production pipeline Nextflow
Need auto-parallelization Nextflow
Full manual control CLI

Quick Examples

CLI: Single Sample

gbcms run \
    --variants variants.vcf \
    --bam tumor.bam \
    --fasta hg19.fa \
    --output-dir results/ \
    --threads 4

CLI: Multiple Samples (Sequential)

gbcms run \
    --variants variants.vcf \
    --bam-list samples.txt \
    --fasta hg19.fa \
    --output-dir results/

Nextflow: Many Samples (Parallel)

# samplesheet.csv:
# sample,bam,bai
# tumor1,/path/to/tumor1.bam,
# tumor2,/path/to/tumor2.bam,

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta hg19.fa \
    --outdir results \
    -profile slurm

Documentation

📚 Full Documentation: https://cmo-ci.gitbook.io/py-gbcms/

Quick Links:


Contributing

See CONTRIBUTING.md for development guidelines.

To contribute to documentation, see the gh-pages branch.


Citation

If you use py-gbcms in your research, please cite:

[Citation to be added]

License

AGPL-3.0 - see LICENSE for details.


Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_gbcms-2.2.0.tar.gz (48.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

py_gbcms-2.2.0-cp312-cp312-manylinux_2_28_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

py_gbcms-2.2.0-cp311-cp311-macosx_11_0_arm64.whl (3.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

py_gbcms-2.2.0-cp311-cp311-macosx_10_12_x86_64.whl (3.0 MB view details)

Uploaded CPython 3.11macOS 10.12+ x86-64

File details

Details for the file py_gbcms-2.2.0.tar.gz.

File metadata

  • Download URL: py_gbcms-2.2.0.tar.gz
  • Upload date:
  • Size: 48.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for py_gbcms-2.2.0.tar.gz
Algorithm Hash digest
SHA256 59140ac94a3e28486ad1ce38d10791203655e25f09b54cfb1da019bd4a73eca0
MD5 cdc728e9b1325b4ed4b524f6172df586
BLAKE2b-256 59b4ec8f3af7e059795919fe13d1939dfc88cceab28a9e158384731dd019811a

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_gbcms-2.2.0.tar.gz:

Publisher: release.yml on msk-access/py-gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_gbcms-2.2.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for py_gbcms-2.2.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8ddedd23085260456e0b2bfd75e0d1a1d40270a38e8d2091f41da394d7ac80fe
MD5 0729b56febc4d267a2acd5e9cd7891e6
BLAKE2b-256 2b995b119bc51c4e783a8f948b7eb65d2d79564a62eafa65d0be1b9bb3113ffd

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_gbcms-2.2.0-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: release.yml on msk-access/py-gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_gbcms-2.2.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for py_gbcms-2.2.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e0c6e71af11c70608a432382bd05edc8e2ce0bcf76b23f4e1d1fef3d3acbdeed
MD5 a2c7e9b86f6276eb7fc3a40c8c448520
BLAKE2b-256 22c0b1b62929e0099d62743d743445aeb07fdb504641736b56d123c91de7ff36

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_gbcms-2.2.0-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: release.yml on msk-access/py-gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_gbcms-2.2.0-cp311-cp311-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for py_gbcms-2.2.0-cp311-cp311-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 33420bb8ea94a262978002dfb54d2df4808916886219ecc85d60a8b908e06088
MD5 e3c5992f624efc79305667c52e88ddba
BLAKE2b-256 da2f4d252516058388cc3387718c7cf28567454ddce85b16b7357bc57026f449

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_gbcms-2.2.0-cp311-cp311-macosx_10_12_x86_64.whl:

Publisher: release.yml on msk-access/py-gbcms

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page