No project description provided
Project description
CNV From BAM
cnv_from_bam
is a Rust library developed to efficiently calculate dynamic Copy Number Variation (CNV) profiles from sequence alignments contained in BAM files. It seamlessly integrates with Python using PyO3, making it an excellent choice for bioinformatics workflows involving genomic data analysis.
Features
- Efficient Processing: Optimized for handling large genomic datasets in BAM format.
- Python Integration: Built with PyO3 for easy integration into Python-based genomic analysis workflows.
- Multithreading Support: Utilizes Rust's powerful concurrency model for improved performance.
- Dynamic Binning: Bins the genome dynamically based on total read counts and genome length.
- CNV Calculation: Accurately calculates CNV values for each bin across different contigs.
- Directory Support: Supports processing of multiple BAM files in a directory. (Requires alignment to the same reference in all BAM files)
Installation
To use cnv_from_bam
in your Rust project, add the following to your Cargo.toml
file:
[dependencies]
cnv_from_bam = "0.1.0" # Replace with the latest version
Usage
Here's a quick example of how to use the iterate_bam_file
function:
use cnv_from_bam::iterate_bam_file;
use std::path::PathBuf;
let bam_path = PathBuf::from("path/to/bam/file.bam");
// Iterate over the BAM file and calculate CNV values for each bin. Number of threads is set to 4 and mapping quality filter is set to 60.
// If number of threads is not specified, it defaults to the number of logical cores on the machine.
let result = iterate_bam_file(bam_path, Some(4), Some(60), None, None);
// Process the result...
The results in this case are returned as a CnvResult, which has the following structure:
/// Results struct for python
#[pyclass]
#[derive(Debug)]
pub struct CnvResult {
/// The CNV per contig
#[pyo3(get)]
pub cnv: PyObject,
/// Bin width
#[pyo3(get)]
pub bin_width: usize,
/// Genome length
#[pyo3(get)]
pub genome_length: usize,
/// Variance of the whole genome
#[pyo3(get)]
pub variance: f64,
}
Where result.cnv
is a Python dict PyObject
containing the Copy Number for each bin of bin_width
bases for each contig in the reference genome, result.bin_width
is the width of the bins in bases, result.genome_length
is the total length of the genome and result.variance
is a measure of the variance across the whole genome.
Variance is calculated as the average of the squared differences from the Mean.
[!NOTE] Note: Only the main primary mapping alignment start is binned, Supplementary and Secondary alignments are ignored. Supplementary alignments can be included by setting
exclude_supplementary
Directory analysis
To analyse a directory of BAM files, use the iterate_bam_dir
function:
use cnv_from_bam::iterate_bam_dir;
use std::path::PathBuf;
let bam_path = PathBuf::from("path/to/bam_directory/");
// Iterate over the BAM files in teh directory and calculate CNV values for the whole. Number of threads is set to 4 and mapping quality filter is set to 60.
// If number of threads is not specified, it defaults to the number of logical cores on the machine.
let result = iterate_bam_file(bam_path, Some(4), Some(60));
This again returns a CnvResult, but this time the CNV values are summed across all BAM files in the directory. The bin width and genome length are calculated based on the first BAM file in the directory.
[!NOTE] Note: All BAM files in the directory must be aligned to the same reference genome.
Python Integration
cnv_from_bam
can be used in Python using the PyO3 bindings. To install the Python bindings, run:
pip install cnv_from_bam
The same iterate_bam_file
is available in python, accepting a path to a BAM file or a directory of BAM files, the number of threads (set to None
to use the optimal number of threads for the machine), and the mapping quality filter.
Example simple plot in python
```python
from matplotlib import pyplot as plt
import matplotlib as mpl
from pathlib import Path
from cnv_from_bam import iterate_bam_file
import numpy as np
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 3))
total = 0
bam_path = Path("path/to/bam/file.bam");
# Iterate over the BAM file and calculate CNV values for each bin. Number of threads is set to 4 and mapping quality filter is set to 60.
# If number of threads is not specified, it defaults to the optimal number of threads for the machine.
result = iterate_bam_file(bam_path, _threads=4, mapq_filter=60);
for contig, cnv in result.cnv.items():
ax.scatter(x=np.arange(len(cnv)) + total, y=cnv, s =0.1)
total += len(cnv)
ax.set_ylim((0,8))
ax.set_xlim((0, total))
Should look something like this. Obviously the cnv data is just a dictionary of lists, so you can do whatever you want with it vis a vis matplotlib, seaborn, etc.
Output
This is new in version >= 0.3. If you just want raw stdout from rust and no faffing with loggers, use v0.2.
Progress Bar
By default, a progress bar is displayed, showing the progress of the iteration of each BAM file. To disable the progress bar, set the CI
environment variable to 1
in your python script:
import os
os.environ["CI"] = "1"
Logging
We use the log
crate for logging. By default, the log level is set to INFO
, which means that the program will output the progress of the iteration of each BAM file. To disable all but warning and error logging, set the log level to WARN
on the iterate_bam_file
function:
import logging
from cnv_from_bam import iterate_bam_file
iterate_bam_file(bam_path, _threads=4, mapq_filter=60, log_level=int(logging.WARN))
getLevelName
is a function from the logging
module that converts the log level to the integer value of the level. These values are
Level | Value |
---|---|
CRITICAL | 50 |
ERROR | 40 |
WARNING | 30 |
INFO | 20 |
DEBUG | 10 |
NOTSET | 0 |
[!NOTE] In v0.3 a regression was introduced, whereby keeping the GIL for logging meant that BAM reading was suddenly single threaded again. Whilst it was possible to fix this and keep
PyO3-log
, I decided to go for truly maximum speed instead. The only drawback to the removal ofPyO3-log
in (v0.4+) is that log messages will not be handled by python loggers, so they won't be written out by a file handler, for example.
Documentation
To generate the documentation, run:
cargo doc --open
Contributing
Contributions to cnv_from_bam
are welcome!
We use pre-commit hooks (particularly cargo-fmt
and ruff
) to ensure that code is formatted correctly and passes all tests before being committed. To install the pre-commit hooks, run:
git clone https://github.com/Adoni5/cnv_from_bam.git
cd cnv_from_bam
pip install -e .[dev]
pre-commit install -t pre-commit -t post-checkout -t post-merge
pre-commit run --all-files
Changelog
v0.4.2
- Returns the contig names naturally sorted, rather than in random order!! For example
chr1, chr2, chr3...chr22,chrM,chrX,chrY
! Huge, will prevent some people getting repeatedly confused about expected CNV vs. Visualised and wasting an hour debugging a non existing issue. - Returns variance across the whole genome in the CNV result struct.
v0.4.1
- Add
exclude_supplementary
parameter toiterate_bam_file
, to exclude supplementary alignments (default True)
v0.4.0
- Remove
PyO3-log
for maximum speed. This means that log messages will not be handled by python loggers. Can set log level on call toiterate_bam_file
v0.3.0
- Introduce
PyO3-log
for logging. This means that log messages can be handled by python loggers, so they can be written out by a file handler, for example. - HAS A LARGE PERFORMANCE ISSUE
- Can disable progress bar display by setting
CI
environment variable to1
in python script.
v0.2.0
- Purely rust based BAM parsing, using noodles.
- Uses a much more sensible number for threading if not provided.
- Allows iteration of BAMS in a directory
v0.1.0
- Initial release
- Uses
rust-bio/rust-htslib
for BAM parsing. Has to bind C code, is a faff.
License
This project is licensed under the Mozilla Public License 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for cnv_from_bam-0.4.2-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 873da92416cb83e691ba717428adecf7f455bb205c06eba361146db11021533a |
|
MD5 | 4242e1935fef63ca31efdde470350b55 |
|
BLAKE2b-256 | 9a1f054760c66653a4a1e12d46535b27c20b28f2852725ef9532c9b4f2a48e8d |
Hashes for cnv_from_bam-0.4.2-pp310-pypy310_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0006ef5bd544c482e91ff2855450e43efc31086e7b8a8196f170759cd115ec8b |
|
MD5 | 7c9fefb855b62c4dae16d9f2eff58cec |
|
BLAKE2b-256 | 816cd426ba7d4bd4d8390c2ed6abfc05f486d22823b2f8b274984b48993085e0 |
Hashes for cnv_from_bam-0.4.2-pp310-pypy310_pp73-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e391f2a34545769a76fd37c4f1efe28d01f3d1d2c6d3694e9173d0004b1a102b |
|
MD5 | 38f9173221637a536075bba782bba8d2 |
|
BLAKE2b-256 | c733a9b7496b8976ed2e1e442732cfdb3499224066e14b64fae3899d06cf4b44 |
Hashes for cnv_from_bam-0.4.2-pp310-pypy310_pp73-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2fec1c0289a59d76a8a833ac902ed24172050f3c28915d5c1bad8d70c7ab77e |
|
MD5 | dbdb8400ae6fa89ebf889e63f6c2cc69 |
|
BLAKE2b-256 | 794aa1d8da12286f7c52f46e8998169254b5f86ef083507523513bc229484680 |
Hashes for cnv_from_bam-0.4.2-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 58787e1d6bfdf2225d239b0608a10cc5d5ab5e6ef02123c25ca9f45ccf4ac9b7 |
|
MD5 | 1919933456bf2b25a82c0dd1cb03287e |
|
BLAKE2b-256 | e4a119f833a60f2c9b906c5b272b2034b41080edbabbc7778f1805e73f495782 |
Hashes for cnv_from_bam-0.4.2-pp39-pypy39_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ce9953e580660d303d030cf48c51f7824e570cb097fc5a60fafc5ed38eaca7d0 |
|
MD5 | 1fd87e2b1d3e291ed8042dc7f11410a7 |
|
BLAKE2b-256 | 377f88475cefa51fd7d0087139cda5a837a1ac8509373d978a14cd12e0be49b4 |
Hashes for cnv_from_bam-0.4.2-pp39-pypy39_pp73-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7daacf4e39000897c430f0f6add78ac83a186b714360a42d6209ef34af049385 |
|
MD5 | 546183f4551387a51a08b6cb75055435 |
|
BLAKE2b-256 | f1b4d897655162d1409ae63cf624d5e6b30809a919ad91df0fc5ace3b73a1a84 |
Hashes for cnv_from_bam-0.4.2-pp39-pypy39_pp73-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 646598562c52fa02471c24bca4b5ce0062dfec5990ddaf31cc316624abd7b531 |
|
MD5 | 7f02aaa23f5cf871ed11fd6a05f8c6ca |
|
BLAKE2b-256 | d4fb80541c8a4eb61a5c28bedfdf04e81e4b78ae2bf9f9bfbe956d6adab95b4a |
Hashes for cnv_from_bam-0.4.2-pp38-pypy38_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b2850835856526d1fb01b17eec74f33c45505c1fa2eae5b5e297b4b42ae785d6 |
|
MD5 | 1238b865d76bf5f13874990417e05d0d |
|
BLAKE2b-256 | 492859242cde3fc63bb41bf328898e99201b99b2c2a130cc5d40484c768de79a |
Hashes for cnv_from_bam-0.4.2-pp38-pypy38_pp73-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6a57f9fa198169552ff2d85bb8f3fba4eb86b089aee72480f07c0dd09dc50315 |
|
MD5 | f07d1bfcc3ac788fd1fa39a60012c397 |
|
BLAKE2b-256 | ffe5090a4e6be7cfac9049b22c16db000949d1181adb1a06e9fcf842192ce0a7 |
Hashes for cnv_from_bam-0.4.2-pp38-pypy38_pp73-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7fcc45fc240a051442640ca550302398d3a61428616107c2c80ba1df2f6e8578 |
|
MD5 | f389962d86103eb5f7f1714f64db372a |
|
BLAKE2b-256 | 995fd2ce8c74a5ee60cf094935b8af12bb7bfb5d370fffcfc43768f5be94d899 |
Hashes for cnv_from_bam-0.4.2-pp38-pypy38_pp73-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c10b6692e16564974f3a7d1a0f1bc069614cdba4b9ba0fb3b95f0b1c855481c |
|
MD5 | ae34ee70c2b92fa427a385927f21e38c |
|
BLAKE2b-256 | 8aeb208b4aafe176d9e54fe999e34273968e53b1fa91b7d80edd6b1b52e0a3d4 |
Hashes for cnv_from_bam-0.4.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a37fbf47e628776fe04b318bf08f1944ca3ddc66888f5949b04d4463dd71eb21 |
|
MD5 | 04a622127837c977d753f7efa68195bc |
|
BLAKE2b-256 | 84cd848d23b86559e26a5c83361111dc403ea359693fe50d87f270b3cfcf0063 |
Hashes for cnv_from_bam-0.4.2-cp312-cp312-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea9797c7bc18ff45604e8e58bd25b7ee59bcb59913521a62232f19f0959d6c85 |
|
MD5 | eae8c66532aeb0eec676f18c9ee5b87b |
|
BLAKE2b-256 | d6a8996852b7a294023c39ca2e0d2f1659c024d5810fdb629befe69fa71838e4 |
Hashes for cnv_from_bam-0.4.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 52c13f6ab80807ea72e1f37c0c470f4c38fac969b301c4c5ac7e7a7374392777 |
|
MD5 | 719bfa4b512b559faa1eb7f2dabdc3d8 |
|
BLAKE2b-256 | 5606b361903578756406029a9fcd1ef651e203c0c1cf38f3c97f1f185b270564 |
Hashes for cnv_from_bam-0.4.2-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 37f50c69ca217d0fe2f6731013cd3a16f2bfeabbcf14f5bfd95948f4a7f30074 |
|
MD5 | 4c93a90aafc41206bb370381ee19d2a9 |
|
BLAKE2b-256 | 22a061599edd67e890d374fea35a31cfb3866dbf38dfa398ef04dd852d034146 |
Hashes for cnv_from_bam-0.4.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc8f855a50149576e7bf1bfc6b4cb7c4f27db1c17722b2bf92e67384e5b2914a |
|
MD5 | 453268284d468a898d7c7ec8eef9b9a7 |
|
BLAKE2b-256 | 0fefafa9b73c3117581d9b7705762c7a36380c1803a9a2cceb6127be3fafaec8 |
Hashes for cnv_from_bam-0.4.2-cp311-cp311-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ffc96dab972498bfcfdc9a357a0c75b0d41c6057638b9d24126063303b6b378 |
|
MD5 | 237aba3cce5a49800d1ec5641e73c49d |
|
BLAKE2b-256 | bce3cc1bc071701b7884c1faecc3681d33e273193f1592c686dd45f0ebf030ab |
Hashes for cnv_from_bam-0.4.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 855796d2723295d3cddbe26e3d09fe3b6ac845fe4473871d8aa9e1f97fafcf91 |
|
MD5 | 8b3ace468b860e36b48a362931dd6502 |
|
BLAKE2b-256 | 18e764e3d5ab52b253acf3cdd95b3b2f7e2de3848bf022ddd792a63bfe310f74 |
Hashes for cnv_from_bam-0.4.2-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67d27417378c4c5279bdacadc4eae5a6aa46badc70d5ac2d131bb56d60db1829 |
|
MD5 | 8a8f241563fa0a8d41ec32d0abd80227 |
|
BLAKE2b-256 | 37e65c32e602aa7028cece1557319a4a211dc83b354ec9e45986037099b6832e |
Hashes for cnv_from_bam-0.4.2-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ba0c76eb546311ef8f3c5356cab2c490dfab322139170e0f9dc2fb8e8215230 |
|
MD5 | a2a9dd70c4a51d4a2fd42bf51b2801c0 |
|
BLAKE2b-256 | c033e99b79bc56c69a60af9c08ca96c7f49d0ad5a59aac250cd342b524ddf486 |
Hashes for cnv_from_bam-0.4.2-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d13a076cf74bf36a3236c56e2f24305f88364de63a3d54c5f84164260cac62c9 |
|
MD5 | 0eb0fd4b56d4108df1c6f433f0415434 |
|
BLAKE2b-256 | 5279ad3bd52884929c2c044156be1a371785853fe85db3bd95a06bd2ac98d39f |
Hashes for cnv_from_bam-0.4.2-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db73fd6ae2312a2315c186dc39a5228a80a511361f26d42a80d49fe475275fd8 |
|
MD5 | 301ccea9a16993364fd8e14152574578 |
|
BLAKE2b-256 | e5c9c1805c57cc0c7c73e33dbdddc49a517fce4f04a2c040f58ba34466478054 |
Hashes for cnv_from_bam-0.4.2-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f71e9b8141aac1b08231db0824484624654ce2e1c2ba71b8812bac4266325339 |
|
MD5 | a1ee12761d42299fec20f7e9093a40e8 |
|
BLAKE2b-256 | 8a8693927ee6ea1d82ea7e546419c9ec4c77d3dc8601475548f1253dcd86f9a7 |
Hashes for cnv_from_bam-0.4.2-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 299185cf6831ae6803bc7712d4bed52dc0df27cba5ade0020646b4e1e589a4ae |
|
MD5 | a30edbd750a937c2e4a3d89c8ad8ad8c |
|
BLAKE2b-256 | bae9b5540e4692e881c6079608e4a439f4a15cc6b36060ae5952a6aff7fba862 |
Hashes for cnv_from_bam-0.4.2-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f83dd4ca5ab78e78ec051ee59d0cba2c87e761982e344903a30ce9618dd7f9d |
|
MD5 | 677fdcf0d9e7f1e60009f02c1343c21e |
|
BLAKE2b-256 | 5c8d523f8fb780ea193b3abe584b5899eda8651bc78487a6cb6cc92f33c5476f |
Hashes for cnv_from_bam-0.4.2-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25d578c34cb170b09ec96a5bd93be62df717de38a348deb3ebabd3fa442fec14 |
|
MD5 | 9fbeb0cc6257c312b1d2ee58d2374996 |
|
BLAKE2b-256 | dbf03ec328bce6fa91bd6088e838f06896c71f95f40590e0aa9438c5d984640f |
Hashes for cnv_from_bam-0.4.2-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7e3abca41ed6446cdfa272df3e6f4f6de8284850eb9f6c9ee71391289eb6af5 |
|
MD5 | 89f7352ea143d9b60d90f6f75c12316f |
|
BLAKE2b-256 | a2dd881bfd0e1f1fae1bbe6a7dd399c3c2e610f7b55173b0608c0574f2d29bf1 |
Hashes for cnv_from_bam-0.4.2-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7a87aef77e09df9ca13ea021175780d4dbb3aa97bef08dceec9043b056dd8268 |
|
MD5 | f7b70693004be6004a035240bb9ec08b |
|
BLAKE2b-256 | a2ee129a2c530e8d58e28207d407b6f3016d2406d738d1fdc6be1233b95eed04 |
Hashes for cnv_from_bam-0.4.2-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 09d537b637be33b733fad7114ef709641e9bf385eb9fc2d388087955b93f6cc5 |
|
MD5 | 76756113c8c0b0a5c62ee282e1a49d57 |
|
BLAKE2b-256 | a60d4d4e51fd11fa90d0f7a7ec26abd50cb95641ba601dc98b87937502986037 |
Hashes for cnv_from_bam-0.4.2-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28aea076f7e02ef13e5e4372a4770499220dd3c392cc1dcc8f2f58f2564d390c |
|
MD5 | 122667b8fa4db30a1d788dd7d083574b |
|
BLAKE2b-256 | 8a60d71627212e2c9b2814ae092303d31399b0d6d913b970d4d5b5f96d872a75 |
Hashes for cnv_from_bam-0.4.2-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a90ce8ff489fa31b13cb11e94188761daac8a46b91686d2e937f403b97cb0554 |
|
MD5 | a862ae759ac79c452cf71ec77de3db95 |
|
BLAKE2b-256 | 099c4a9ff31b2d9132eefdfba8761b869b5b7ecab3358ff7d39aa0a127f8a6bb |