Genomic region based arrays backed by TileDB
Project description
Genomic Arrays based on TileDB
GenomicArrays is a Python package for converting genomic data from BigWig format to TileDB arrays.
Installation
Install the package from PyPI
pip install genomicarrays
Quick Start
Build a GenomicArray
Building a GenomicArray
generates 3 TileDB files in the specified output directory:
feature_annotation
: A TileDB file containing input feature intervals.sample_metadata
: A TileDB file containing sample metadata, each BigWig file is considered a sample.- A matrix TileDB file named by the
layer_matrix_name
parameter. This allows the package to store multiple different matrices, e.g. 'coverage', 'some_computed_statistic', for the same interval, and sample metadata attributes.
The organization is inspired by the SummarizedExperiment data structure. The TileDB matrix file is stored in a features X samples orientation.
To build a GenomicArray
from a collection of BigWig
files:
import numpy as np
import tempfile
import genomicarrays as garr
# Create a temporary directory, this is where the
# output files are created. Pick your location here.
tempdir = tempfile.mkdtemp()
# List BigWig paths
bw_dir = "your/biwig/dir"
files = os.listdir(bw_dir)
bw_files = [f"{bw_dir}/{f}" for f in files]
features = pd.DataFrame({
"chrom": ["chr1", "chr1"],
"start": [1000, 2000],
"end": [1500, 2500]
})
# Build GenomicArray
garr.build_genomicarray(
files=bw_files,
output_path=tempdir,
features=features,
# agg function to summarize mutiple values
# from bigwig within an input feature interval.
feature_annotation_options=garr.FeatureAnnotationOptions(
aggregate_function = np.nanmean
),
# for parallel processing multiple bigwig files
num_threads=4
)
The build process stores missing intervals from a bigwig file as np.nan
. The
default is to choose an aggregate functions that works with np.nan
.
Note
This project has been set up using PyScaffold 4.6. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file genomicarrays-0.0.1.tar.gz
.
File metadata
- Download URL: genomicarrays-0.0.1.tar.gz
- Upload date:
- Size: 97.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ec60fb5dd708333db190b85f77e5d69ce80a8851b35cce01da136ecb870613cf |
|
MD5 | b777ecd344fb12dfbbd97b629a0f9532 |
|
BLAKE2b-256 | 8e46a8dbd29f5e87ab601843eb4ba79043bcaa57a7f7a6a040c32345b9c6c75c |
File details
Details for the file GenomicArrays-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: GenomicArrays-0.0.1-py3-none-any.whl
- Upload date:
- Size: 16.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.20
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b48c318148abefd6216ea1760e63960659ca8035f3b344217c7f97887149946b |
|
MD5 | 7c55ad66c8eced82d32acec7f41d3a2e |
|
BLAKE2b-256 | c2a36bff9169289db92a209cd70c5673f5f3c06f4c76828f484bb0334b35bd8d |