Skip to main content

MLflow is an open source platform for the complete machine learning lifecycle

Project description

MLflow: A Machine Learning Lifecycle Platform

Latest Docs Apache 2 License Total Downloads Slack Twitter

MLflow is an open-source platform, purpose-built to assist machine learning practitioners and teams in handling the complexities of the machine learning process. MLflow focuses on the full lifecycle for machine learning projects, ensuring that each phase is manageable, traceable, and reproducible


The core components of MLflow are:

  • Experiment Tracking 📝: A set of APIs to log models, params, and results in ML experiments and compare them using an interactive UI.
  • Model Packaging 📦: A standard format for packaging a model and its metadata, such as dependency versions, ensuring reliable deployment and strong reproducibility.
  • Model Registry 💾: A centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of MLflow Models.
  • Serving 🚀: Tools for seamless model deployment to batch and real-time scoring on platforms like Docker, Kubernetes, Azure ML, and AWS SageMaker.
  • Evaluation 📊: A suite of automated model evaluation tools, seamlessly integrated with experiment tracking to record model performance and visually compare results across multiple models.
  • Observability 🔍: Tracing integrations with various GenAI libraries and a Python SDK for manual instrumentation, offering smoother debugging experience and supporting online monitoring.
MLflow Hero

Installation

To install the MLflow Python package, run the following command:

pip install mlflow

Alternatively, you can install MLflow from on differnet package hosting platforms:

PyPI PyPI - mlflow PyPI - mlflow-skinny
conda-forge Conda - mlflow Conda - mlflow-skinny
CRAN CRAN - mlflow
Maven Central Maven Central - mlflow-client Maven Central - mlflow-parent Maven Central - mlflow-scoring Maven Central - mlflow-spark

Documentation 📘

Official documentation for MLflow can be found at here.

Running Anywhare 🌐

You can run MLflow on many different environments, including local development, Amazon SageMaker, AzureML, and Databricks. Please refer to this guidance for how to setup MLflow on your environment.

Usage

Experiment Tracking (Doc)

The following examples trains a simple regression model with scikit-learn, while enabling MLflow's autologging feature for experiment tracking.

import mlflow

from sklearn.model_selection import train_test_split
from sklearn.datasets import load_diabetes
from sklearn.ensemble import RandomForestRegressor

# Enable MLflow's automatic experiment tracking for scikit-learn
mlflow.sklearn.autolog()

# Load the training dataset
db = load_diabetes()
X_train, X_test, y_train, y_test = train_test_split(db.data, db.target)

rf = RandomForestRegressor(n_estimators=100, max_depth=6, max_features=3)
# MLflow triggers logging automatically upon model fitting
rf.fit(X_train, y_train)

Once the above code finishes, run the following command in a separate terminal and access the MLflow UI via the printed URL. An MLflow Run should be automatically created, which tracks the training dataset, hyper parameters, performance metrics, the trained model, dependencies, and even more.

mlflow ui

Serving Models (Doc)

You can deploy the logged model to a local inference server by a one-line command using the MLflow CLI. Visit the documentation for how to deploy models to other hosting platforms.

mlflow models serve --model-uri runs:/<run-id>/model

Evaluating Models (Doc)

The following example runs automatic evaluation for question-answering tasks with several built-in metrics.

import mlflow
import pandas as pd

# Evaluation set contains (1) input question (2) model outputs (3) ground truth
df = pd.DataFrame(
    {
        "inputs": ["What is MLflow?", "What is Spark?"],
        "outputs": [
            "MLflow is an innovative fully self-driving airship powered by AI.",
            "Sparks is an American pop and rock duo formed in Los Angeles.",
        ],
        "ground_truth": [
            "MLflow is an open-source platform for managing the end-to-end machine learning (ML) "
            "lifecycle.",
            "Apache Spark is an open-source, distributed computing system designed for big data "
            "processing and analytics.",
        ],
    }
)
eval_dataset = mlflow.data.from_pandas(
    df, predictions="outputs", targets="ground_truth"
)

# Start an MLflow Run to record the evaluation results to
with mlflow.start_run(run_name="evaluate_qa"):
    # Run automatic evaluation with a set of built-in metrics for question-answering models
    results = mlflow.evaluate(
        data=eval_dataset,
        model_type="question-answering",
    )

print(results.tables["eval_results_table"])

Observability (Doc)

MLflow Tracing provides LLM observability for various GenAI libraries such as OpenAI, LangChain, LlamaIndex, DSPy, AutoGen, and more. To enable auto-tracing, call mlflow.xyz.autolog() before running your models. Refer to the documentation for customization and manual instrumentation.

import mlflow
from openai import OpenAI

# Enable tracing for OpenAI
mlflow.openai.autolog()

# Query OpenAI LLM normally
response = OpenAI().chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hi!"}],
    temperature=0.1,
)

Then navigate to the "Traces" tab in the MLflow UI to find the trace records OpenAI query.

Community

  • For help or questions about MLflow usage (e.g. "how do I do X?") visit the docs or Stack Overflow.
  • Alternatively, you can ask the question to our AI-powered chat bot. Visit the doc website and click on the "Ask AI" button at the right bottom to start chatting with the bot.
  • To report a bug, file a documentation issue, or submit a feature request, please open a GitHub issue.
  • For release announcements and other discussions, please subscribe to our mailing list (mlflow-users@googlegroups.com) or join us on Slack.

Contributing

We happily welcome contributions to MLflow! We are also seeking contributions to items on the MLflow Roadmap. Please see our contribution guide to learn more about contributing to MLflow.

Core Members

MLflow is currently maintained by the following core members with significant contributions from hundreds of exceptionally talented community members.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow-2.20.3.tar.gz (27.8 MB view details)

Uploaded Source

Built Distribution

mlflow-2.20.3-py3-none-any.whl (28.4 MB view details)

Uploaded Python 3

File details

Details for the file mlflow-2.20.3.tar.gz.

File metadata

  • Download URL: mlflow-2.20.3.tar.gz
  • Upload date:
  • Size: 27.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.15

File hashes

Hashes for mlflow-2.20.3.tar.gz
Algorithm Hash digest
SHA256 a7b1baf53d4f10160864961320df0c4cb74fb4f21c7522ef80a35290d03573bb
MD5 478886f8128079016f80853a0c7546cd
BLAKE2b-256 66d513b37b5fa1f08bb4c7df06bfd9117d363bc7c3f1dfc3a1f7261dc4536c0e

See more details on using hashes here.

File details

Details for the file mlflow-2.20.3-py3-none-any.whl.

File metadata

  • Download URL: mlflow-2.20.3-py3-none-any.whl
  • Upload date:
  • Size: 28.4 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.15

File hashes

Hashes for mlflow-2.20.3-py3-none-any.whl
Algorithm Hash digest
SHA256 efafe5d4d17b53be1ae02c7d8708a5e4bbde4bd3aecd2bd68b64a3c4175e9dc6
MD5 5af44bb11261c39bb1f6da576fb24d72
BLAKE2b-256 b8f389dc9daf896ec5caea037dfc8cd24bcef69995aed2d30164afd69af5ec8d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page