Skip to main content

Blazingly fast SBS matrix generator library

Project description

SBSGenerator

Actions Status Actions Status PyPI PyPI License: MIT

SBSGenerator is a comprehensive Python package designed for bioinformaticians and researchers working in the field of genomics. This package offers a robust set of tools for generating, analyzing, and interpreting single base substitutions (SBS) mutations from Variant Call Format (VCF) files. With a focus on ease of use, efficiency, and scalability, SBSGenerator facilitates the detailed study of genomic mutations, aiding in the understanding of their roles in various biological processes and diseases. Uniquely developed using a hybrid of Python and Rust, SBSGenerator leverages the PyO3 library for seamless integration between Python's flexible programming capabilities and Rust's unparalleled performance. This innovative approach ensures that SBSGenerator is not only user-friendly but also incredibly efficient and capable of handling large-scale genomic data with ease.

Installation

$ pip install sbsgenerator

Usage

The SBSGenerator package is designed to facilitate the generation and analysis of SBS mutation data from VCF files across different genomic contexts. Depending on the specified context size, it can create comprehensive dataframes listing all possible SBS mutations, ranging from simple 3-nucleotide contexts to more complex 7-nucleotide contexts, with the potential number of mutation combinations exponentially increasing with context size.

  • Context 3: The dataframe contains all of the following the pyrimidine single nucleotide variants, N[{C > A, G, or T} or {T > A, G, or C}]N. 4 possible starting nucleotides x 6 pyrimidine variants x 4 ending nucleotides = 96 total combinations.

  • Context 5: The dataframe contains all of the following the pyrimidine single nucleotide variants, NN[{C > A, G, or T} or {T > A, G, or C}]NN. 16 (4x4) possible starting nucleotides x 6 pyrimidine variants x 16 (4x4) possible ending nucleotides = 1536 total combinations.

  • Context 7: The dataframe contains all of the following the pyrimidine single nucleotide variants, NNN[{C > A, G, or T} or {T > A, G, or C}]NNN. 64 (4x4x4) nucleotides x 6 pyrimidine variants x 64 (4x4x4) possible ending dinucleotides = 24576 total combinations.

VCF INPUT FILE FORMAT

This tool currently only supports vcf formats. The user must provide variant data adhering to the format. The input VCF (Variant Call Format) file should adhere to the following format:

Name Fullname Datatypes
Type Represents the type of mutation. str
Gene Indicates the specific gene associated with the mutation. str
PMID Refers to the PubMed ID of the associated research paper. str
Genome Specifies the genome version used for mapping. str
Mutation Type Describes the type of mutation. str
Chromosome Represents the chromosome number where the mutation occurs. str
Start Position Indicates the starting position of the mutation on the chromosome. str
End Position Represents the ending position of the mutation on the chromosome. str
Reference Allele Denotes the original allele at the mutation site. str
Mutant Allele Represents the altered allele resulting from the mutation. str
Method Describes the method used for mutation detection. str
from sbsgenerator import generator
# Context number (must be larger than 3 and uneven)
context_size = 7
# List with all the vcf files
vcf_files = ["data/test.vcf"]
# Where the ref genomes will be downloaded to
ref_genome = "temp/ref_genomes"
sbsgen = generator.SBSGenerator(
    context=context_size,
    vcf_files=vcf_files,
    ref_genome=ref_genome
)
sbsgen.count_mutations()
# The attribute count_samples holds the sbs matrix
sbsgen.count_samples

Contributing

I welcome contributions to SBSGenerator! If you have suggestions for improvements or bug fixes, please open an issue or submit a pull request.

License

SBSGenerator is released under the MIT License. See the LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbsgenerator-1.0.5.tar.gz (94.9 kB view hashes)

Uploaded Source

Built Distributions

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-pp310-pypy310_pp73-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-pp39-pypy39_pp73-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-pp38-pypy38_pp73-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded PyPy manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-cp312-none-win_amd64.whl (164.1 kB view hashes)

Uploaded CPython 3.12 Windows x86-64

sbsgenerator-1.0.5-cp312-none-win32.whl (159.1 kB view hashes)

Uploaded CPython 3.12 Windows x86

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-cp312-cp312-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-cp312-cp312-macosx_11_0_arm64.whl (277.9 kB view hashes)

Uploaded CPython 3.12 macOS 11.0+ ARM64

sbsgenerator-1.0.5-cp312-cp312-macosx_10_12_x86_64.whl (282.8 kB view hashes)

Uploaded CPython 3.12 macOS 10.12+ x86-64

sbsgenerator-1.0.5-cp311-none-win_amd64.whl (165.7 kB view hashes)

Uploaded CPython 3.11 Windows x86-64

sbsgenerator-1.0.5-cp311-none-win32.whl (161.5 kB view hashes)

Uploaded CPython 3.11 Windows x86

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-cp311-cp311-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-cp311-cp311-macosx_11_0_arm64.whl (279.4 kB view hashes)

Uploaded CPython 3.11 macOS 11.0+ ARM64

sbsgenerator-1.0.5-cp311-cp311-macosx_10_12_x86_64.whl (285.8 kB view hashes)

Uploaded CPython 3.11 macOS 10.12+ x86-64

sbsgenerator-1.0.5-cp310-none-win_amd64.whl (165.7 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

sbsgenerator-1.0.5-cp310-none-win32.whl (161.6 kB view hashes)

Uploaded CPython 3.10 Windows x86

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-cp310-cp310-macosx_11_0_arm64.whl (279.3 kB view hashes)

Uploaded CPython 3.10 macOS 11.0+ ARM64

sbsgenerator-1.0.5-cp310-cp310-macosx_10_12_x86_64.whl (285.6 kB view hashes)

Uploaded CPython 3.10 macOS 10.12+ x86-64

sbsgenerator-1.0.5-cp39-none-win_amd64.whl (166.0 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

sbsgenerator-1.0.5-cp39-none-win32.whl (161.8 kB view hashes)

Uploaded CPython 3.9 Windows x86

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.5+ i686

sbsgenerator-1.0.5-cp38-none-win_amd64.whl (165.7 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

sbsgenerator-1.0.5-cp38-none-win32.whl (160.5 kB view hashes)

Uploaded CPython 3.8 Windows x86

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_17_s390x.manylinux2014_s390x.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ s390x

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_17_ppc64le.manylinux2014_ppc64le.whl (1.2 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ppc64le

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_17_armv7l.manylinux2014_armv7l.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARMv7l

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

sbsgenerator-1.0.5-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.whl (1.1 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.5+ i686

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page