Skip to main content

Bam2Tensor

Project description

bam2tensor

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

bam2tensor is a Python package for converting .bam files to dense representations of methylation data (as .npz NumPy arrays). It is designed to evaluate all CpG sites and store methylation states for loading into other deep learning pipelines.

bam2tensor logo

Features

  • Parses .bam files using pysam
  • Extracts methylation data from all CpG sites
  • Supports any genome (Hg38, T2T-CHM13, mm10, etc.)
  • Stores data in sparse format (COO matrix) for efficient loading
  • Exports methylation data to .npz NumPy arrays
  • Easily parallelizable

Requirements

  • Python 3.9+
  • pysam, numpy, scipy, tqdm

Installation

You can install bam2tensor via pip from PyPI:

pip install bam2tensor

Usage

Please see the Reference Guide for full details.

Data Structure

One .npz file is generated for each separate .bam, which can be loaded using scipy.sparse.load_npz(). Each .npz file contains a single sparse SciPy COO matrix.

In the COO matrix, each row represents a read and each column represents a CpG site. The value at each row/column is the methylation state (0 = unmethylated, 1 = methylated, -1 = no data). Note that -1 can represent indels or point mutations.

Todo

  • Consider storing a Read ID: Row ID mapping?
  • Export / more stably store & import embedding mapping? (.npz or other instead of .json?)
  • Store metadata / object reference in .npz file?

Contributing

Contributions are welcome! Please see the Contributor Guide.

License

Distributed under the terms of the MIT license, bam2tensor is free and open source.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project is developed and maintained by Nick Semenkovich (@semenko), as part of the Medical College of Wisconsin's Data Science Institute.

This project was generated from Statistics Norway's SSB PyPI Template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bam2tensor-1.3.tar.gz (15.0 kB view details)

Uploaded Source

Built Distribution

bam2tensor-1.3-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file bam2tensor-1.3.tar.gz.

File metadata

  • Download URL: bam2tensor-1.3.tar.gz
  • Upload date:
  • Size: 15.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for bam2tensor-1.3.tar.gz
Algorithm Hash digest
SHA256 4114149d526c895e1506f8fde9259772c66f11a4e893eb16a0262418425fe738
MD5 a8077fa772f5c2d0203b68613f603b62
BLAKE2b-256 d8d543b328996dd0c1693fe506240fc3437069423abeb41b80a5287cad296af9

See more details on using hashes here.

File details

Details for the file bam2tensor-1.3-py3-none-any.whl.

File metadata

  • Download URL: bam2tensor-1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for bam2tensor-1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 7ff29f5b88186a42ca6e48efde6aea7cffe3a6b91b15e875de0c5b274a3c7a40
MD5 cbc09e8e6b5129cb49973d10a350c866
BLAKE2b-256 4df84cab667140752bd2108ed8e4e5f45ad834af932e19a1c4310156a7ced0c6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page