MLflow integration for ai2-tango

These details have not been verified by PyPI

Project links

Homepage

Project description

tango-mlflow

MLflow integration for ai2-tango

Introduction

tango-mlflow is a Python library that connects the Tango to MLflow. Tango, developed by AllenAI, is a flexible pipeline library for managing experiments and caching outputs of each step. MLflow, on the other hand, is a platform that helps manage the Machine Learning lifecycle, including experimentation, reproducibility, and deployment. This integration enables you to store and manage complete experimental settings and artifacts of your Tango run in MLflow, thus enhancing your experiment tracking capabilities.

Here's a screenshot of the MLflow interface when executing with tango-mlflow:

239279085-79ca2c6b-f9a1-49aa-a301-bb502a2340d2

Tango run: The top-level run in MLflow corresponds to a single execution in Tango, and its name matches the execution name in Tango.
Tango steps: Results of each step in Tango are recorded as nested runs in MLflow. The names of these nested runs correspond to the names of the steps in Tango.
Parameters and Metrics: The entire settings for the execution, as well as the parameters for each step, are automatically logged in MLflow.
Artifacts and Caching: The cached outputs of each step's execution are saved as artifacts under the corresponding MLflow run. These can be reused when executing Tango, enhancing efficiency and reproducibility.

Installation

Install tango-mlflow by running the following command in your terminal:

pip install tango-mlflow[all]

Usage

You can use the MLFlowWorkspace with command line arguments as follows:

tango run --workspace mlflow://your_experiment_name --include-package tango_mlflow

Alternatively, you can define your configuration in a tango.yml file:

workspace:
  type: mlflow
  experiment_name: your_experiment_name

include_package:
  - tango_mlflow

In the above tango.yml configuration, type specifies the workspace type as MLflow, and experiment_name sets the name of your experiment in MLflow. The include_package field needs to include the tango_mlflow package to use tango-mlflow functionality.

Remember to replace your_experiment_name with the name of your specific experiment.

To log runs remotely, set the MLFLOW_TRACKING_URI environment variable to a tracking server’s URI like below:

export MLFLOW_TRACKING_URI=https://mlflow.example.com

Functionalities

Logging metrics into MLflow

The tango-mlflow package provides the MlflowStep class, which allows you to easily log the results of each step execution to MLflow.

from tango_mlflow.step import MlflowStep

class TrainModel(MlflowStep):
    def run(self, **params):

        # pre-process...

        for epoch in range(max_epochs):
            loss = train_epoch(...)
            metrics = evaluate_model(...)
            # log metrics with mlflow_logger
            for name, value in metrics.items():
                self.mlflow_logger.log_metric(name, value, step=epoch)

        # post-process...

In the example above, the TrainModel step inherits from MlflowStep. Inside the step, you can directly record metrics to the corresponding MLflow run by invoking self.mlflow_logger.log_metric(...).

Please note, this functionality must be used in conjunction with MlflowWorkspace.

Summarizing Tango run metrics

You can specify a step to record its returned metrics as representative values of the Tango run by setting the class variable MLFLOW_SUMMARY = True. This feature enables you to conveniently view metrics for each Tango run directly in the MLflow interface

class EvaluateModel(Step):
    MLFLOW_SUMMARY = True  # Enables MLflow summary!

    def run(self, ...) -> dict[str, float]:
        # compute metrics ...
        return metrics

In the example above, the EvaluateModel step returns metrics that are logged as the representative values for that Tango run. These metrics are then recorded in the corresponding (top-level) MLflow run.

Please note the following requirements:

The return value of a step where MLFLOW_SUMMARY = True is set must always be dict[str, float].
You don't necessarily need to inherit from MlflowStep to use MLFLOW_SUMMARY.

Tuning hyperparameters with Optuna

tango-mlflow also provides the tango-mlflow tune command for tuning hyperparameters with Optuna. For more details, please refer to the examples/breast_cancer directory.

Examples

Basic example: examples/euler
Hyper parameter tuning with Optuna: examples/breast_cancer

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

2.0.0

Jun 20, 2024

1.1.1

May 24, 2023

1.1.0

May 18, 2023

1.0.1

Apr 10, 2023

1.0.0

Apr 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tango_mlflow-2.0.0.tar.gz (22.9 kB view details)

Uploaded Jun 20, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

tango_mlflow-2.0.0-py3-none-any.whl (26.3 kB view details)

Uploaded Jun 20, 2024 Python 3

File details

Details for the file tango_mlflow-2.0.0.tar.gz.

File metadata

Download URL: tango_mlflow-2.0.0.tar.gz
Upload date: Jun 20, 2024
Size: 22.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.9 Linux/6.5.0-1022-azure

File hashes

Hashes for tango_mlflow-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`103c5114cead57dcb39c90b0b97eeb81d998ee128a1503b71cc03ba6f7b2fe39`
MD5	`61f1c0102f61e0b041f9f802f8b6f851`
BLAKE2b-256	`2453ad5ae3c63de413d8bbded25f027b56b03135bc9c032da1e696cf09732c30`

See more details on using hashes here.

File details

Details for the file tango_mlflow-2.0.0-py3-none-any.whl.

File metadata

Download URL: tango_mlflow-2.0.0-py3-none-any.whl
Upload date: Jun 20, 2024
Size: 26.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.3 CPython/3.10.9 Linux/6.5.0-1022-azure

File hashes

Hashes for tango_mlflow-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`48e785c88a062198653fc858fb2698ec303e887123e839279a1d6b60569d1ff4`
MD5	`65875bb16a2bb979d4669c9b729d8932`
BLAKE2b-256	`a9b9d51027133457c6b9e4cefdaed52d7d80c0ac79f8213cb1579eb883155853`

See more details on using hashes here.

tango-mlflow 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

tango-mlflow

Introduction

Installation

Usage

Functionalities

Logging metrics into MLflow

Summarizing Tango run metrics

Tuning hyperparameters with Optuna

Examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes