Delayed array operations from Bioconductor
Project description
DelayedArrays, in Python
This package implements classes for delayed array operations, mirroring the Bioconductor package of the same name. It allows BiocPy-based packages to easily inteoperate with delayed arrays from the Bioconductor ecosystem, with focus on serialization to/from file with chihaya/rds2py and entry into tatami-compatible C++ libraries via mattress.
Installation
This package is published to PyPI and can be installed via the usual methods:
pip install delayedarray
Quick start
We can create a DelayedArray
from any object that respects the seed contract,
i.e., has the shape
/dtype
properties and supports NumPy slicing.
For example, a typical NumPy array qualifies:
import numpy
x = numpy.random.rand(100, 20)
We can wrap this in a DelayedArray
class:
import delayedarray
d = delayedarray.DelayedArray(x)
## <100 x 20> DelayedArray object of type 'float64'
## [[0.87165637, 0.37536154, 0.49505459, ..., 0.90147358, 0.13091768,
## 0.7288351 ],
## [0.06014594, 0.04758512, 0.1932337 , ..., 0.83628993, 0.63886397,
## 0.37175146],
## [0.86038138, 0.1844154 , 0.45318283, ..., 0.411131 , 0.61720257,
## 0.44831668],
## ...,
## [0.2960631 , 0.85775072, 0.83518558, ..., 0.32533032, 0.59257349,
## 0.36232564],
## [0.7026017 , 0.86221974, 0.42704164, ..., 0.7612019 , 0.58842594,
## 0.51895466],
## [0.4321901 , 0.29703596, 0.34399029, ..., 0.04685882, 0.20102342,
## 0.05495118]]
And then we can use it in a variety of operations.
Each operation just returns a DelayedArray
with an increasing stack of delayed operations, without evaluating anything or making any copies.
s = d.sum(axis=0)
n = (numpy.log1p(d / s) + 5)[1:5,:]
## <4 x 20> DelayedArray object of type 'float64'
## array([[5.01864954, 5.01248763, 5.00465425, 5.01366904, 5.01444268,
## 5.01740277, 5.00211704, 5.00456718, 5.01170253, 5.00268081,
## 5.00069047, 5.01792154, 5.01174818, 5.007219 , 5.01613611,
## 5.01998141, 5.00359273, 5.00891747, 5.00167042, 5.00480139],
## [5.01319369, 5.01366843, 5.00259837, 5.01438949, 5.0168967 ,
## 5.0118356 , 5.01468261, 5.00266368, 5.00820377, 5.01519285,
## 5.00880128, 5.01867732, 5.00597971, 5.0132913 , 5.0169869 ,
## 5.02033736, 5.0054349 , 5.01064519, 5.01484268, 5.00933761],
## [5.01056552, 5.00430873, 5.01554934, 5.01523742, 5.00447682,
## 5.00896808, 5.01702989, 5.00417863, 5.0106902 , 5.01643898,
## 5.00436048, 5.01041755, 5.01358732, 5.01173475, 5.00581787,
## 5.01454487, 5.0097424 , 5.01313867, 5.01227209, 5.01212552],
## [5.00265869, 5.01460805, 5.00834077, 5.01877699, 5.00009671,
## 5.01027705, 5.00650493, 5.01116854, 5.00582936, 5.00997989,
## 5.00213256, 5.00145715, 5.00797343, 5.01588012, 5.01435549,
## 5.00294226, 5.01381951, 5.01344824, 5.020751 , 5.01294937]])
Users can then call numpy.array()
to realize the delayed operations into a typical NumPy array for consumption.
Alternatively, users can use the .as_dask_array()
method to obtain a dask array.
simple = numpy.array(n)
type(simple)
## <class 'numpy.ndarray'>
dasky = n.as_dask_array()
type(dasky)
## <class 'dask.array.core.Array'>
Check out the documentation for more information.
For developers
Ideally, we would use dask directly and avoid creating a set of DelayedArray
wrapper classes.
We could parse the HighLevelGraph
objects and retrieve the delayed operations for serialization/reconstruction in other frameworks like R and C++.
Unfortunately, it was tricky to parse the call graph reliably (see the developer notes).
So, the real purpose of the DelayedArray package is to make it easier for Bioconductor developers to inspect the delayed operations.
For example, we can pull out the "seed" object underlying our DelayedArray
instance:
n.seed
## <delayedarray.Subset.Subset object at 0x11cfbe690>
Each layer has its own specific attributes that define the operation, e.g.,
n.seed.subset
## (range(1, 5), range(0, 20))
Recursively drilling through the object will eventually reach the underlying array(s):
n.seed.seed.seed.seed.seed
## array([[0.78811524, 0.87684408, 0.56980128, ..., 0.92659988, 0.8716243 ,
## 0.8855508 ],
## [0.96611119, 0.36928726, 0.30364589, ..., 0.14349135, 0.92921468,
## 0.85097595],
## [0.98374144, 0.98197003, 0.18126507, ..., 0.5854122 , 0.48733974,
## 0.90127042],
## ...,
## [0.05566008, 0.24581195, 0.4092705 , ..., 0.79169303, 0.36982844,
## 0.59997214],
## [0.81744194, 0.78499666, 0.80940409, ..., 0.65706498, 0.16220355,
## 0.46912681],
## [0.41896894, 0.58066043, 0.57069833, ..., 0.61640286, 0.47174326,
## 0.7149704 ]])
All attributes required to reconstruct a delayed operation are public and considered part of the stable DelayedArray
interface.
Note
This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file DelayedArray-0.1.1.tar.gz
.
File metadata
- Download URL: DelayedArray-0.1.1.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7dd63970c2c44ba5e1849e84d373c9f90c3c36e809c2378a0d7b9bdeb77dbb47 |
|
MD5 | 0ddbbf9ad0ce8d55bd7db7f40d40fbec |
|
BLAKE2b-256 | 95af367229b884f60986347f35713a12f046e11396d7d96253aad925b7b6b34a |
File details
Details for the file DelayedArray-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: DelayedArray-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9443471b903bd9471655bfdcd3e4f026d0eeb5a54012a450a0631cdc7dd9b245 |
|
MD5 | 032cd8c71147cb2540681305d7f32cd7 |
|
BLAKE2b-256 | c3094e68b55363ea752b52f76f287917908aaa815716145c5b107ffad35f3d93 |