Skip to main content

A data organization and compilation system.

Project description

Datamate

Datamate is a lightweight data and configuration management framework for structuring data in machine learning projects on a hierarchical filesystem.

Datamate provides a simple framework to work with heterogenous data by automating input and output of arrays and configurations to disk. It provides an interface to the system's filesystem through pointers to files and representations of the hierarchical structure.

Typical usecases are:

  • automating pathing and orchestrating data
  • seamless input and output operations to a hierarchical filesystem
  • keep track of configurations, e.g. for preprocessing, experiments, analyses
  • structured preprocessing with minimal overhead code---cause configuration-based, preprocessed data can automatically be computed only once and then referenced to
  • for instance to skip slow computations when restarting the kernel in your everything_in_here.ipynb notebook
  • interactive prototyping in data-heterogenous applications: hierarchical file views in notebooks, pandas integration, configuration diffs, simultaneous write and read

Examples

Datamate's Directory instances can point to (processed) data on the disk (relative to a root directory), allowing seamless I/O.

E.g., to store a numpy array

>>> import datamate
>>> datamate.set_root_dir("./data")
>>> directory = datamate.Directory("experiment_01")  # pointer to ./data/experiment_01
>>> directory.array = np.arange(5)  # creates parent directory and writes array to h5 file
>>> directory
experiment_01/ - Last modified: April 04, 2022 08:24:56
└── array.h5

displaying: 1 directory, 1 files

To retrieve the array:

>>> import datamate
>>> datamate.set_root_dir("./data")
>>> directory = datamate.Directory("experiment_01")
>>> directory.array[:]
array([0, 1, 2, 3, 4])

More detailed examples in examples/01. Introduction to Datamate.ipynb.

Installation

Using pip:

pip install datamate

Related frameworks

Datamate is adapted from artisan to focus on flexibility in interactive jupyter notebooks with only optional configuration and type enforcement.

Because cloud-based and relational database solutions for ML-workflows can be little beginner friendly or little flexible, Datamate is simply based on I/O of arrays and configurations on disk with pythonic syntax, and it targets interactive and notebook-based workflows.

Contribution

Contributions welcome!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamate-0.2.7.tar.gz (406.4 kB view details)

Uploaded Source

Built Distribution

datamate-0.2.7-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file datamate-0.2.7.tar.gz.

File metadata

  • Download URL: datamate-0.2.7.tar.gz
  • Upload date:
  • Size: 406.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for datamate-0.2.7.tar.gz
Algorithm Hash digest
SHA256 259754488570a0976ed538607159c79d77484efb572140a02fa71780de2ef6d9
MD5 4d8362f6ebcc0de3dcc8cee680f68e58
BLAKE2b-256 681ac5c18305e91acb3e5b5fda74745dd13dcc25191c78973599d18b3a8fc4db

See more details on using hashes here.

File details

Details for the file datamate-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: datamate-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for datamate-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 a0c77efb68fa93aa7a2c458de8015e82d992928316babe9d1642e9fde14fccf2
MD5 251a6bd08405c3ce97f6529850bbfa8c
BLAKE2b-256 e2438c37d0eec66bfbdaae7b6933ae484040eb7bea1c029abab0888370355ca8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page