Skip to main content

package for machine learning meta-data management and analysis

Project description

ArangoML Pipeline

ArangoML Pipeline is a common and extensible Metadata Layer for Machine Learning Pipelines which allows Data Scientists and DataOps to manage all information related to their ML pipeline in one place.

News: ArangoML Pipeline Cloud is offering a no-setup, free-to-try managed service for ArangpML Pipeline. A ArangoML Pipeline Cloud tutorial is also available without any installation or signup.

Quick Start

To get started without any installations (using ArangoML Pipeline Cloud) , click : Open In Colab

The examples folder contains notebooks that illustrate the features of Arangopipe.

Overview

When machine learning pipelines are created, for example using TensorFlow Extended or Kubeflow, the capture (and access to) of metadata across the pipeline is vital. Typically, each component of such an ML pipeline produces or requires metadata, for example:

  • Data storage: size, location, creation date, checksum, ...
  • Feature Store (processed dataset): transformation, version, base datasets ...
  • Model Training: training/validation performance, training duration, ...
  • Model Serving: model linage, serving performance, ...

Instead of each component storing its metadata, a common metadata layer simplifies data management and permits querying the entire pipeline. ArangoDB, being a multi model database, supporting both efficient document and graph data models within a single database engine, is a great fit for such a metadata layer, for the following reasons:

  • The metadata produced by each component is typically unstructured (e.g., TensorFlow's training metadata is different from PyTorch's metadata) and hence a great fit for document databases
  • The relationship between the different entities (i.e., metadata) can be neatly expressed as graphs (e.g., this model has been trained by run_34 on dataset_y)
  • Metadata queries are easily expressed as graph traversals (e.g., all models which have been derived from dataset_y)

Use Cases

ArangoML Pipeline can benefit many scenarios, such as:

  • Capture of lineage information (e.g., Which dataset influences which model?)
  • Capture of audit information (e.g, A given model was training two months ago with the following training/validation performance)
  • Reproducible model training
  • Model serving policy (e.g., Which model should be deployed in production based on training statistics)
  • Extension of existing ML pipelines through simple python/HTTP API

Documentation

Please refer to the Arangopipe documentation for further information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arangopipe-0.0.6.9.5.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

arangopipe-0.0.6.9.5-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file arangopipe-0.0.6.9.5.tar.gz.

File metadata

  • Download URL: arangopipe-0.0.6.9.5.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.9

File hashes

Hashes for arangopipe-0.0.6.9.5.tar.gz
Algorithm Hash digest
SHA256 aec1230587461fccc8e78fdf6aaac30ff3daec3463b0a9ebafbb355310aa33db
MD5 2a231a894a8571d74e9033b2f3f1bb30
BLAKE2b-256 8780a07c7516dfa6a0110c762613ed118b2b4d991fb89a58423dff11c18fa797

See more details on using hashes here.

File details

Details for the file arangopipe-0.0.6.9.5-py3-none-any.whl.

File metadata

  • Download URL: arangopipe-0.0.6.9.5-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.9

File hashes

Hashes for arangopipe-0.0.6.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 d6a06b730a0533a26cf37f78bad5e5667ad3709188291b7c9b72fa15b803dfb7
MD5 8161b134801212d85669038454de38a1
BLAKE2b-256 cc8ba69892713cf02b5459f753a78eadfa1ba575abf39b167251d1f289d27bd8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page