Skip to main content

Multidimensional arrays storage engine

Project description

DEKER™

image

PyPI version shields.io PyPI pyversions GitHub license codecov Code style: black

DEKER™ is pure Python implementation of petabyte-scale highly parallel data storage engine for multidimensional arrays.

DEKER™ name comes from term dekeract, the 10-cube.

DEKER™ was made with the following major goals in mind:

  • provide intuitive interface for storing and accessing huge data arrays
  • support arbitrary number of data dimensions
  • be thread and process safe and as lean on RAM use as possible

DEKER™ empowers users to store and access a wide range of data types, virtually anything that can be represented as arrays, like geospacial data, satellite images, machine learning models, sensors data, graphs, key-value pairs, tabular data, and more.

DEKER™ does not limit your data complexity and size: it supports virtually unlimited number of data dimensions and provides under the hood mechanisms to partition huge amounts of data for scalability.

Features

  • Open source under GPL 3.0
  • Scalable storage of huge virtual arrays via tiling
  • Parallel processing of virtual array tiles
  • Own locking mechanism enabling virtual arrays parallel read and write
  • Array level metadata attributes
  • Fancy data slicing using timestamps and named labels
  • Support for industry standard NumPy, Xarray
  • Storage level data compression and chunking (via HDF5)

Code and Documentation

Open source implementation of DEKER™ storage engine is published at

API documentation and tutorials for the current release could be found at

Quick Start

Dependencies

Minimal Python version for DEKER™ is 3.9.

DEKER™ depends on the following third-party packages:

  • numpy >= 1.18
  • attrs >= 23.1.0
  • tqdm >= 4.64.1
  • psutil >= 5.9.5
  • h5py >= 3.8.0
  • hdf5plugin >= 4.0.1

Also please not that for flexibility few internal DEKER™ components are published as separate packages:

Install

To install DEKER™ run:

pip install deker

Please refer to documentation for advanced topics such as running on Apple silicone or using Xarray with DEKER™ API.

First Steps

Now you can write simple script to jump into DEKER™ development:

from deker import Client, ArraySchema, DimensionSchema, TimeDimensionSchema
from datetime import datetime, timedelta, timezone
import numpy as np

# Where all data will be kept
DEKER_URI = "file:///tmp/deker"

# Define 3-dimensional schema with to numeric and one time dimension
dimensions = [
   DimensionSchema(name="y", size=128),
   DimensionSchema(name="x", size=128),
   TimeDimensionSchema(
      name="forecast_dt",
      size=128,
      start_value=datetime.now(timezone.utc),
      step=timedelta(3),
   )
]

# Define array schema with float dtype and dimensions
array_schema = ArraySchema(dtype=float, dimensions=dimensions)

# Instantiate client using context manager
with Client(DEKER_URI) as client:
   # Create collection
   collection = client.create_collection("my_collection", array_schema)
   
   # Create array
   array = collection.create()
   
   # Write some data
   array[:].update(np.ones(shape=array.shape))
   
   # And read the data back
   data = array[:].read()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deker-1.1.5.tar.gz (78.1 kB view details)

Uploaded Source

File details

Details for the file deker-1.1.5.tar.gz.

File metadata

  • Download URL: deker-1.1.5.tar.gz
  • Upload date:
  • Size: 78.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for deker-1.1.5.tar.gz
Algorithm Hash digest
SHA256 ae5503bd9a1124d432b08320149ad3bee1ffc77d079749f4139bdab56d44c159
MD5 e1ae5a9ac36bf02b5743621784ddac5a
BLAKE2b-256 105bdcd97ce900f0888738a34031b8d4adaf1729841bb429ec1d52fdbcee8bb3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page