Skip to main content

Memory-mapped numeric arrays, based on a format that is self-explanatory and tool-independent

Project description

Darr is a Python science library for disk-based NumPy arrays that persist in a format that is simple, self-documented and tool-independent. It enables you to work efficiently with potentially very large arrays, while keeping your data easily accessible from a wide range of computing environments. Every array is documented with code to read itself in languages such as R, Julia, IDL, Matlab, Maple, and Mathematica, or in Python/Numpy without Darr. Keeping data universally readable and documented is a pillar of good scientific practice. More rationale for this approach is provided here.

Under the hood, Darr uses NumPy’s memory-mapped arrays, which is a widely used and tested way of working with disk-based numerical arrays. It has therefore full NumPy compatibility and efficient out-of-core read/write access to potentially very large arrays. What Darr adds is that it does all the bookkeeping for you to keep your arrays fully documented, open, and widely readable. Further, Darr adds functionality to make your life as a data scientist easier in other ways, such as the support for ragged arrays, the ability to create arrays from iterators, append and truncate functionality, and the easy use of metadata.

Flat binary files and (JSON) text files are accompanied by a README text file that explains how the array and metadata are stored. It is trivially easy to share your data with others or with yourself when working in different computing environments because it always contains clear documentation, including code to read it. Does your colleague want to try out an interesting algorithm in R or Matlab on your array data? No need to export anything or to provide elaborate explanation. A copy-paste of a few lines of code from the documentation stored with the data is sufficient. No dependence on complicated formats or specialized libraries. Self-documentation and code examples are automatically updated as you change your arrays when working with them.

See this tutorial for a brief introduction, or the documentation for more info.

Darr is currently pre-1.0, still undergoing significant development. It is open source and freely available under the New BSD License terms.

Features

  • Disk-persistent array data is directly accessible through NumPy indexing.

  • Works with data arrays larger than RAM.

  • Data is stored purely based on flat binary and text files, maximizing tool independence.

  • Data is automatically documented and includes a README text file with human-readable explanation of how the data is stored.

  • README includes examples of how to read the array in a number of popular data analysis environments, such as Python (without Darr), R, Julia, Octave/Matlab, GDL/IDL, and Mathematica (see example array).

  • Data is easily appendable.

  • Many numeric types are supported: (u)int8-(u)int64, float16-float64, complex64, complex128.

  • Easy use of metadata, stored in a separate JSON text file.

  • Minimal dependencies, only NumPy.

  • Integrates easily with the Dask library for out-of-core computation on very large arrays.

  • Supports ragged arrays.

See the documentation for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

darr-0.5.0.tar.gz (63.4 kB view details)

Uploaded Source

Built Distribution

darr-0.5.0-py3-none-any.whl (51.3 kB view details)

Uploaded Python 3

File details

Details for the file darr-0.5.0.tar.gz.

File metadata

  • Download URL: darr-0.5.0.tar.gz
  • Upload date:
  • Size: 63.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for darr-0.5.0.tar.gz
Algorithm Hash digest
SHA256 4230a1051992e4e2f3b1043e482413df4afd11c32dc12fe67ac25937669f5ccf
MD5 536e6c9ad692e74689e4b68c535540d5
BLAKE2b-256 d1c7858dbe4bd25236a14263504143ce773813139ad05a9bf07e894b694c912f

See more details on using hashes here.

File details

Details for the file darr-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: darr-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 51.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for darr-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6d0cc3647956cc41829037a28d4e437ab241d1d1db77748a8ebd9ba6fff16604
MD5 e079a155904512adc25036127715570f
BLAKE2b-256 e1a6df74554429608389b5400ea43684f63a95f072ccf78662448d545029956a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page