Skip to main content

Memory-mapped numeric arrays, based on a format that is self-explanatory and tool-independent

Project description

Darr is a Python science library to work with potentially large NumPy arrays and metadata that persist on disk, in a format that is simple, self-documented and tool-independent. The goal is to keep your data easily accessible on the short and long term, from a wide range of computing environments. Keeping data universally readable and documented is in line with good scientific practice. It not only makes it easy to share data with others, but also to look at you own data with different tools. More rationale for this approach is provided here.

Flat binary files and (JSON) text files are accompanied by a README text file that explains how the specific data and metadata are stored and how they can be read. This includes code for reading the array in a variety of current scientific data tools such as Python, R, Julia, IDL, Matlab, Maple, and Mathematica. It is trivially easy to share your data with others or with yourself when working in different computing environments, because it always contains a clear and specific description of how to read it. No need to export anything or to provide elaborate explanation. No dependence on complicated formats or specialized tools.

Darr uses NumPy memmory-mapped arrays under the hood, which you can access directly for full NumPy compatibility and efficient out-of-core read/write access to potentially very large arrays. In addition, Darr supports the possibility to append and truncate arrays, and the use of ragged arrays (still experimental).

See this tutorial for a brief introduction, or the documentation for more info.

Darr is currently pre-1.0, still undergoing significant development. It is open source and freely available under the New BSD License terms.

Features

Pro’s:

  • Data persists on-disk, purely based on flat binary and text files, tool independence.

  • README text file with human-readable explanation of how the binary data is stored.

  • README includes examples of how to read the array in a number of popular data analysis environments, such as Python (without Darr), R, Julia, Octave/Matlab, GDL/IDL, and Mathematica (see example array).

  • Works with data arrays larger than RAM.

  • Data read/write access is simple and powerful through NumPy indexing (see here).

  • Data is easily appendable.

  • Many numeric types are supported: (u)int8-(u)int64, float16-float64, complex64, complex128.

  • Easy use of metadata, stored in a separate JSON text file.

  • Minimal dependencies, only NumPy.

  • Integrates easily with the Dask library for out-of-core computation on very large arrays.

  • Supports ragged arrays (still experimental).

See the documentation for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darr-0.4.0.tar.gz (58.9 kB view details)

Uploaded Source

Built Distribution

darr-0.4.0-py3-none-any.whl (46.1 kB view details)

Uploaded Python 3

File details

Details for the file darr-0.4.0.tar.gz.

File metadata

  • Download URL: darr-0.4.0.tar.gz
  • Upload date:
  • Size: 58.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for darr-0.4.0.tar.gz
Algorithm Hash digest
SHA256 51790835c607689ce6d35fa35c6226f6bca2ae4ab80dcdf1e75833146e8d5209
MD5 edfe8b395e77a7003c7e11f2aee84b62
BLAKE2b-256 2cc460a8625ab95a56fc8632ef93db040f2957e50fdb0039fcb5e720b4b9c658

See more details on using hashes here.

File details

Details for the file darr-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: darr-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 46.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for darr-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c7eda6fab421bbccdb1e71bd3d8fef1ca4d7d4dcaad31d6595879fdf984caced
MD5 b3164152a0a60515a7fb5bddddba2c90
BLAKE2b-256 f930ad4b115a3c343e554d363f6441df7e49587a6cd5aa6c9c0b272764cf29c7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page