Skip to main content

TileDb backed objects for array and matrix like data

Project description

Project generated with PyScaffold PyPI-Server Monthly Downloads Unit tests

tiledbarray

This is the Python equivalent of Bioconductor's TileDBArray package, providing a representation of TileDB-backed arrays within the delayedarray framework. The idea is to allow users to store, manipulate and operate on large datasets without loading them into memory, in a manner that is trivially compatible with other data structures in the BiocPy ecosystem.

Installation

This package can be installed from PyPI with the usual commands:

pip install tiledbarray

Quick start

Let's mock up a dense array:

import numpy
data = numpy.random.rand(40, 50)

tiledb.from_numpy("dense.tiledb", data)

We can now represent it as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray("dense.tiledb", attribute_name="")
# <40 x 50> TileDbArray object of type 'float64'
# [[0.96316214, 0.90187013, 0.55767551, ..., 0.81663263, 0.57660051,
#   0.3986336 ],
#  [0.72578394, 0.06328588, 0.9473141 , ..., 0.89977069, 0.34617884,
#   0.09208036],
#  [0.87291607, 0.01714908, 0.96570953, ..., 0.28404601, 0.20394673,
#   0.6454273 ],
#  ...,
#  [0.21565857, 0.11721607, 0.45146332, ..., 0.18565937, 0.348599  ,
#   0.16050929],
#  [0.95061188, 0.71917657, 0.33039149, ..., 0.60267692, 0.28035863,
#   0.56416845],
#  [0.40462116, 0.61058508, 0.5067807 , ..., 0.64234988, 0.5881812 ,
#   0.17138409]]

This is just a subclass of a DelayedArray and can be used anywhere in the BiocPy framework. Parts of the NumPy API are also supported - for example, we could apply a variety of delayed operations:

scaling = numpy.random.rand(100)
transformed = numpy.log1p(arr / scaling)
# <40 x 50> DelayedArray object of type 'float64'
# [[1.29646391, 2.05014167, 0.48661736, ..., 0.90574803, 2.38890685,
#   1.1277655 ],
#  [1.09916863, 0.38865342, 0.72500505, ..., 0.96463182, 1.93797807,
#   0.39371608],
#  [1.22596458, 0.12107778, 0.73496894, ..., 0.41384292, 1.50457489,
#   1.47747976],
#  ...,
#  [0.46673182, 0.63114795, 0.41040352, ..., 0.28897665, 1.94394461,
#   0.61032586],
#  [1.28695229, 1.85595293, 0.31579293, ..., 0.73604123, 1.76033915,
#   1.37526146],
#  [0.74949037, 1.71968269, 0.45082104, ..., 0.76976215, 2.40698455,
#   0.64080734]]

Check out the documentation for more details.

Sparse Matrices

We can perform similar operations on a sparse matrix as well. Lets mock a sparse matrix and store it as a tiledb file.

dir_path = "sparse_array.tiledb"
dom = tiledb.Domain(
     tiledb.Dim(name="rows", domain=(0, 4), tile=5, dtype=np.int32),
     tiledb.Dim(name="cols", domain=(0, 4), tile=5, dtype=np.int32),
)
schema = tiledb.ArraySchema(
     domain=dom, sparse=True, attrs=[tiledb.Attr(name="", dtype=np.int32)]
)
tiledb.SparseArray.create(f"{dir_path}", schema)

tdb = tiledb.SparseArray(f"{dir_path}", mode="w")
i, j = [1, 2, 2], [1, 4, 3]
data = np.array(([1, 2, 3]))
tdb[i, j] = data

We can now represent this as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray(dir_path, attribute_name="")

slices = (slice(0,3), [2, 4])

import delayedarray
subset = delayedarray.extract_sparse_array(arr, (*slices,))
print(subset)
# <3 x 2> SparseNdarray object of type 'int32'
# [[2, 0],
#  [0, 0],
#  [0, 0]]

Check out the delayedarray for more details.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiledbarray-0.1.2.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

tiledbarray-0.1.2-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file tiledbarray-0.1.2.tar.gz.

File metadata

  • Download URL: tiledbarray-0.1.2.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.2.tar.gz
Algorithm Hash digest
SHA256 128076f3813c8a783e1d4290e5d173102674079c4f771d46afcc669d4f1ff0ba
MD5 cb732fd1398467fb7f5c1f7cf4a4a4ee
BLAKE2b-256 3580fed134b283ea3c3fd58eee3cc6f7c8a646b07ad459852a4a244d8a0b5677

See more details on using hashes here.

File details

Details for the file tiledbarray-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: tiledbarray-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e649077fe323787e5223c12bb9979d37a6e08bd2ec04bbe6ab1dbcf4060c7c76
MD5 68ba7a41550fe8686e8b575a2e5a0215
BLAKE2b-256 1dca865f89b3bf936fff333d2745f00047a019fa0560ccefa6415c157b06c572

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page