Skip to main content

TileDb backed objects for array and matrix like data

Project description

Project generated with PyScaffold PyPI-Server Monthly Downloads Unit tests

tiledbarray

This is the Python equivalent of Bioconductor's TileDBArray package, providing a representation of TileDB-backed arrays within the delayedarray framework. The idea is to allow users to store, manipulate and operate on large datasets without loading them into memory, in a manner that is trivially compatible with other data structures in the BiocPy ecosystem.

Installation

This package can be installed from PyPI with the usual commands:

pip install tiledbarray

Quick start

Let's mock up a dense array:

import numpy
data = numpy.random.rand(40, 50)

tiledb.from_numpy("dense.tiledb", data)

We can now represent it as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray("dense.tiledb", attribute_name="")
# <40 x 50> TileDbArray object of type 'float64'
# [[0.96316214, 0.90187013, 0.55767551, ..., 0.81663263, 0.57660051,
#   0.3986336 ],
#  [0.72578394, 0.06328588, 0.9473141 , ..., 0.89977069, 0.34617884,
#   0.09208036],
#  [0.87291607, 0.01714908, 0.96570953, ..., 0.28404601, 0.20394673,
#   0.6454273 ],
#  ...,
#  [0.21565857, 0.11721607, 0.45146332, ..., 0.18565937, 0.348599  ,
#   0.16050929],
#  [0.95061188, 0.71917657, 0.33039149, ..., 0.60267692, 0.28035863,
#   0.56416845],
#  [0.40462116, 0.61058508, 0.5067807 , ..., 0.64234988, 0.5881812 ,
#   0.17138409]]

This is just a subclass of a DelayedArray and can be used anywhere in the BiocPy framework. Parts of the NumPy API are also supported - for example, we could apply a variety of delayed operations:

scaling = numpy.random.rand(100)
transformed = numpy.log1p(arr / scaling)
# <40 x 50> DelayedArray object of type 'float64'
# [[1.29646391, 2.05014167, 0.48661736, ..., 0.90574803, 2.38890685,
#   1.1277655 ],
#  [1.09916863, 0.38865342, 0.72500505, ..., 0.96463182, 1.93797807,
#   0.39371608],
#  [1.22596458, 0.12107778, 0.73496894, ..., 0.41384292, 1.50457489,
#   1.47747976],
#  ...,
#  [0.46673182, 0.63114795, 0.41040352, ..., 0.28897665, 1.94394461,
#   0.61032586],
#  [1.28695229, 1.85595293, 0.31579293, ..., 0.73604123, 1.76033915,
#   1.37526146],
#  [0.74949037, 1.71968269, 0.45082104, ..., 0.76976215, 2.40698455,
#   0.64080734]]

Check out the documentation for more details.

Sparse Matrices

We can perform similar operations on a sparse matrix as well. Lets mock a sparse matrix and store it as a tiledb file.

dir_path = "sparse_array.tiledb"
dom = tiledb.Domain(
     tiledb.Dim(name="rows", domain=(0, 4), tile=5, dtype=np.int32),
     tiledb.Dim(name="cols", domain=(0, 4), tile=5, dtype=np.int32),
)
schema = tiledb.ArraySchema(
     domain=dom, sparse=True, attrs=[tiledb.Attr(name="", dtype=np.int32)]
)
tiledb.SparseArray.create(f"{dir_path}", schema)

tdb = tiledb.SparseArray(f"{dir_path}", mode="w")
i, j = [1, 2, 2], [1, 4, 3]
data = np.array(([1, 2, 3]))
tdb[i, j] = data

We can now represent this as a TileDbArray:

import tiledbarray
arr = tiledbarray.TileDbArray(dir_path, attribute_name="")

slices = (slice(0,3), [2, 4])

import delayedarray
subset = delayedarray.extract_sparse_array(arr, (*slices,))
print(subset)
# <3 x 2> SparseNdarray object of type 'int32'
# [[2, 0],
#  [0, 0],
#  [0, 0]]

Check out the delayedarray for more details.

Note

This project has been set up using PyScaffold 4.5. For details and usage information on PyScaffold see https://pyscaffold.org/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiledbarray-0.1.1.tar.gz (24.9 kB view details)

Uploaded Source

Built Distribution

tiledbarray-0.1.1-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file tiledbarray-0.1.1.tar.gz.

File metadata

  • Download URL: tiledbarray-0.1.1.tar.gz
  • Upload date:
  • Size: 24.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.1.tar.gz
Algorithm Hash digest
SHA256 508d02f299d66521b564068ce0a26b9bdb6a0de0e73d50e93e5fe3bc6bbfa7bd
MD5 910390c8c793a0c3a925c5275c3b6463
BLAKE2b-256 70df5055712e69371cb89e94ec36adc8679d3556283e9bc33654e9b3bf32b928

See more details on using hashes here.

File details

Details for the file tiledbarray-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: tiledbarray-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.18

File hashes

Hashes for tiledbarray-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e9e40ad3d2ace056ac13da7ff5afe1db19ab5ae341427b2238e106154bb4e18a
MD5 db0310c2dd402bb73d44574a3e169c18
BLAKE2b-256 7b6fcc0dbbcf8eabc56864aacd74ba7ab494865a9a767fdbe4fdd5a4bf494010

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page