Skip to main content

TileDb backed objects for array and matrix like data

Project description

Project generated with PyScaffold PyPI-Server Monthly Downloads Unit tests

tiledbarray

This is the Python equivalent of Bioconductor's TileDBArray package, providing a representation of TileDB-backed arrays within the delayedarray framework. The idea is to allow users to store, manipulate and operate on large datasets without loading them into memory, in a manner that is trivially compatible with other data structures in the BiocPy ecosystem.

Installation

This package can be installed from PyPI with the usual commands:

pip install tiledbarray

Quick start

Let's mock up a dense array:

import numpy
data = numpy.random.rand(40, 50)

tiledb.from_numpy("dense.tiledb", data)

We can now represent it as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray("dense.tiledb", attribute_name="")
# <40 x 50> TileDbArray object of type 'float64'
# [[0.96316214, 0.90187013, 0.55767551, ..., 0.81663263, 0.57660051,
#   0.3986336 ],
#  [0.72578394, 0.06328588, 0.9473141 , ..., 0.89977069, 0.34617884,
#   0.09208036],
#  [0.87291607, 0.01714908, 0.96570953, ..., 0.28404601, 0.20394673,
#   0.6454273 ],
#  ...,
#  [0.21565857, 0.11721607, 0.45146332, ..., 0.18565937, 0.348599  ,
#   0.16050929],
#  [0.95061188, 0.71917657, 0.33039149, ..., 0.60267692, 0.28035863,
#   0.56416845],
#  [0.40462116, 0.61058508, 0.5067807 , ..., 0.64234988, 0.5881812 ,
#   0.17138409]]

This is just a subclass of a DelayedArray and can be used anywhere in the BiocPy framework. Parts of the NumPy API are also supported - for example, we could apply a variety of delayed operations:

scaling = numpy.random.rand(100)
transformed = numpy.log1p(arr / scaling)
# <40 x 50> DelayedArray object of type 'float64'
# [[1.29646391, 2.05014167, 0.48661736, ..., 0.90574803, 2.38890685,
#   1.1277655 ],
#  [1.09916863, 0.38865342, 0.72500505, ..., 0.96463182, 1.93797807,
#   0.39371608],
#  [1.22596458, 0.12107778, 0.73496894, ..., 0.41384292, 1.50457489,
#   1.47747976],
#  ...,
#  [0.46673182, 0.63114795, 0.41040352, ..., 0.28897665, 1.94394461,
#   0.61032586],
#  [1.28695229, 1.85595293, 0.31579293, ..., 0.73604123, 1.76033915,
#   1.37526146],
#  [0.74949037, 1.71968269, 0.45082104, ..., 0.76976215, 2.40698455,
#   0.64080734]]

Check out the documentation for more details.

Sparse Matrices

We can perform similar operations on a sparse matrix as well. Lets mock a sparse matrix and store it as a tiledb file.

dir_path = "sparse_array.tiledb"
dom = tiledb.Domain(
     tiledb.Dim(name="rows", domain=(0, 4), tile=5, dtype=np.int32),
     tiledb.Dim(name="cols", domain=(0, 4), tile=5, dtype=np.int32),
)
schema = tiledb.ArraySchema(
     domain=dom, sparse=True, attrs=[tiledb.Attr(name="", dtype=np.int32)]
)
tiledb.SparseArray.create(f"{dir_path}", schema)

tdb = tiledb.SparseArray(f"{dir_path}", mode="w")
i, j = [1, 2, 2], [1, 4, 3]
data = np.array(([1, 2, 3]))
tdb[i, j] = data

We can now represent this as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray(dir_path, attribute_name="")

slices = (slice(0,3), [2, 4])

import delayedarray
subset = delayedarray.extract_sparse_array(arr, (*slices,))
print(subset)
# <3 x 2> SparseNdarray object of type 'int32'
# [[2, 0],
#  [0, 0],
#  [0, 0]]

Check out the delayedarray for more details.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiledbarray-0.1.0.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

tiledbarray-0.1.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file tiledbarray-0.1.0.tar.gz.

File metadata

  • Download URL: tiledbarray-0.1.0.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.0.tar.gz
Algorithm Hash digest
SHA256 c4713f15345db7f3dfa57fee8bb078842b6f080b21ef55b435507bb6adeac0ad
MD5 3c55a50fbbb4bc4bdada2a571970a40f
BLAKE2b-256 4fd71778b33ad8e67821197f9774847c59dbe974672b5b16241553298396384a

See more details on using hashes here.

File details

Details for the file tiledbarray-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: tiledbarray-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 102f3d5102e1e4f30dc094a7536ff2f5b832074c78edf6463f1ce5d9381d5d92
MD5 ca42c66d187ea46e631ef078e60e20b4
BLAKE2b-256 ad99f577926b18d1816f38c51e6e4ecb435e3ae8e9e6246ecc3f6fe1314f2c59

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page