Skip to main content

Bam2Tensor

Project description

bam2tensor

PyPI Status Python Version License

Documentation Tests Coverage Quality Gate Status

pre-commit Black Ruff Poetry

bam2tensor is a Python package for converting .bam files to dense representations of methylation data (as .npz NumPy arrays). It is designed to evaluate all CpG sites and store methylation states for loading into other deep learning pipelines.

bam2tensor logo

Features

  • Parses .bam files using pysam
  • Extracts methylation data from all CpG sites
  • Supports any genome (Hg38, T2T-CHM13, mm10, etc.)
  • Stores data in sparse format (COO matrix) for efficient loading
  • Exports methylation data to .npz NumPy arrays
  • Easily parallelizable

Requirements

  • Python 3.9+
  • pysam, numpy, scipy, tqdm

Installation

You can install bam2tensor via pip from PyPI:

pip install bam2tensor

Usage

Please see the Reference Guide for full details.

Data Structure

One .npz file is generated for each separate .bam, which can be loaded using scipy.sparse.load_npz(). Each .npz file contains a single sparse SciPy COO matrix.

In the COO matrix, each row represents a read and each column represents a CpG site. The value at each row/column is the methylation state (0 = unmethylated, 1 = methylated, -1 = no data). Note that -1 can represent indels or point mutations.

Todo

  • Consider storing a Read ID: Row ID mapping?
  • Export / more stably store & import embedding mapping? (.npz or other instead of .json?)
  • Store metadata / object reference in .npz file?

Contributing

Contributions are welcome! Please see the Contributor Guide.

License

Distributed under the terms of the MIT license, bam2tensor is free and open source.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project is developed and maintained by Nick Semenkovich (@semenko), as part of the Medical College of Wisconsin's Data Science Institute.

This project was generated from Statistics Norway's SSB PyPI Template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bam2tensor-1.2.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bam2tensor-1.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file bam2tensor-1.2.tar.gz.

File metadata

  • Download URL: bam2tensor-1.2.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for bam2tensor-1.2.tar.gz
Algorithm Hash digest
SHA256 39a6954d466a9f080961d2ce0def2e35b0eb412ab9fb8d2185093244de8fc9ae
MD5 605ed6969cf68cd69edea449cfc5404e
BLAKE2b-256 2b1539ad5162702cf2a914218261841d685e6da14b82fbca9ab99f6692adfec6

See more details on using hashes here.

File details

Details for the file bam2tensor-1.2-py3-none-any.whl.

File metadata

  • Download URL: bam2tensor-1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for bam2tensor-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3fc3f756336354d14652b9e0692909f1823a17a5aadcb29c9f404bcd536b02ab
MD5 9fffad6a1f5c8033c6999f7c60538e57
BLAKE2b-256 279e069fa99c8442b2d2a3334a0a0b7b0e059420a4c947b9efadd522f4530156

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page