Skip to main content

High performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing

Project description

scbamtools

High performance Cython + Python tools to process BAM files with tags as they arise in single-cell sequencing

Status

This is alpha. Mostly, the plan is to move useful functionality developed within spacemake outside of spacemake, so that it can be re-used without pulling in all the dependencies for a heavy-weight spatial transcriptomics package. Currently, the umbilical has not been cut and the code is almost certainly not functional w/o spacemake around.

Useful things include:

  • converting FASTQ files to uBAM files with barcode information (single cell and spatial workflows)
  • trimming adapters (uses cutadapt functions under the hood)
  • making histograms and statistics about cell barcodes, UMIs and possibly other BAM tags
  • annotate aligned BAM records against a transcript annotation such as GENCODE
  • build digital gene expression counts from annotated BAM files, directly as scanpy AnnData (h5ad)

Why is this better than ...

Depends what you need. We are building these tools to be as fast as possible while keeping as much of the functionality in python (with the occasional cython) for felxibility and maintainability. We don't care as much about (total) CPU use as we care about throughput/scalability. So, some principles:

  • avoid temp files, streaming is better
  • parallelize with mrfifo for low-overhead parallelism
  • put some effort into efficient data structures where it pays off
  • make simple things simple, while hard things should be possible

The code in here is the same that we use to process open-st spatial transcriptomics data, which is very deep: typical runs having billions of reads and hundreds of millions of spatial barcodes. While we make sure that the tools here don't break and have manageable resource usage, we do not intend to be the most CPU-efficient or allow you to process open-st on your laptop. YMMV.

Roadmap

  • port everything from spacemake [ongoing]
  • full suite of tools to replace dropseq-tools in spacemake [v1.0]
  • optimizations
  • tutorials and example uses outside of spacemake

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scbamtools-0.8.8.tar.gz (599.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scbamtools-0.8.8-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (1.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file scbamtools-0.8.8.tar.gz.

File metadata

  • Download URL: scbamtools-0.8.8.tar.gz
  • Upload date:
  • Size: 599.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.14

File hashes

Hashes for scbamtools-0.8.8.tar.gz
Algorithm Hash digest
SHA256 916ebac0c60ef5a9b1b2fe475c2df422b342a5190e541d2294c6b1f81ef3d709
MD5 8781c91f63534cc4f1d16ae009e52318
BLAKE2b-256 acac814920459689b676193f71f87d40fd1ce45ff7338c05c6efde95a2c2f37a

See more details on using hashes here.

File details

Details for the file scbamtools-0.8.8-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.

File metadata

File hashes

Hashes for scbamtools-0.8.8-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl
Algorithm Hash digest
SHA256 38c478f1d4ea589cb7a1ad74f8161e43394960b97f58d9f4090d5be4118e0ba5
MD5 65d7a7ef50db206703d9c0bc2e108a95
BLAKE2b-256 de4354c74b3821005050acbd40788206b62177e6b938bfb5c095cca50b4f64dc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page