package for machine learning meta-data management and analysis
Project description
ArangoML Pipeline
ArangoML Pipeline is a common and extensible Metadata Layer for Machine Learning Pipelines which allows Data Scientists and DataOps to manage all information related to their ML pipeline in one place.
News: ArangoML Pipeline Cloud is offering a no-setup, free-to-try managed service for ArangpML Pipeline. A ArangoML Pipeline Cloud tutorial is also available without any installation or signup.
Quick Start
To get started without any installations (using ArangoML Pipeline Cloud) , click :
The examples folder contains notebooks that illustrate the features of Arangopipe.
Overview
When machine learning pipelines are created, for example using TensorFlow Extended or Kubeflow, the capture (and access to) of metadata across the pipeline is vital. Typically, each component of such an ML pipeline produces or requires metadata, for example:
- Data storage: size, location, creation date, checksum, ...
- Feature Store (processed dataset): transformation, version, base datasets ...
- Model Training: training/validation performance, training duration, ...
- Model Serving: model linage, serving performance, ...
Instead of each component storing its metadata, a common metadata layer simplifies data management and permits querying the entire pipeline. ArangoDB, being a multi model database, supporting both efficient document and graph data models within a single database engine, is a great fit for such a metadata layer, for the following reasons:
- The metadata produced by each component is typically unstructured (e.g., TensorFlow's training metadata is different from PyTorch's metadata) and hence a great fit for document databases
- The relationship between the different entities (i.e., metadata) can be neatly expressed as graphs (e.g., this model has been trained by run_34 on dataset_y)
- Metadata queries are easily expressed as graph traversals (e.g., all models which have been derived from dataset_y)
Use Cases
ArangoML Pipeline can benefit many scenarios, such as:
- Capture of lineage information (e.g., Which dataset influences which model?)
- Capture of audit information (e.g, A given model was training two months ago with the following training/validation performance)
- Reproducible model training
- Model serving policy (e.g., Which model should be deployed in production based on training statistics)
- Extension of existing ML pipelines through simple python/HTTP API
Documentation
Please refer to the Arangopipe documentation for further information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arangopipe-0.0.6.9.5.tar.gz
.
File metadata
- Download URL: arangopipe-0.0.6.9.5.tar.gz
- Upload date:
- Size: 18.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aec1230587461fccc8e78fdf6aaac30ff3daec3463b0a9ebafbb355310aa33db |
|
MD5 | 2a231a894a8571d74e9033b2f3f1bb30 |
|
BLAKE2b-256 | 8780a07c7516dfa6a0110c762613ed118b2b4d991fb89a58423dff11c18fa797 |
File details
Details for the file arangopipe-0.0.6.9.5-py3-none-any.whl
.
File metadata
- Download URL: arangopipe-0.0.6.9.5-py3-none-any.whl
- Upload date:
- Size: 20.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6a06b730a0533a26cf37f78bad5e5667ad3709188291b7c9b72fa15b803dfb7 |
|
MD5 | 8161b134801212d85669038454de38a1 |
|
BLAKE2b-256 | cc8ba69892713cf02b5459f753a78eadfa1ba575abf39b167251d1f289d27bd8 |