Skip to main content

MLflow integration for Inspect AI: experiment tracking, execution tracing, and Scout analysis

Project description

inspect-mlflow

logo

CI CodeQL Python 3.11+ License: MIT

MLflow integration for Inspect AI. Provides experiment tracking, execution tracing, and artifact logging for Inspect AI evaluations.

Install

pip install inspect-mlflow

Quick Start

No code changes needed. Hooks auto-register via entry points when the package is installed. Set env vars and run evals as usual.

# Start MLflow server
mlflow server --port 5000

# Set env vars
export MLFLOW_TRACKING_URI="http://localhost:5000"
export MLFLOW_INSPECT_TRACING="true"

# Run evals. Hooks auto-activate.
inspect eval my_task.py --model openai/gpt-4o

Then open http://localhost:5000 to see runs and traces.

What it does

Tracking Hook

Activated when MLFLOW_TRACKING_URI is set. Creates hierarchical MLflow runs mirroring the eval structure.

  • Parent run per eval invocation, nested child runs per task
  • Task config logged as parameters (model, dataset, solver, temperature)
  • Per-sample scores as step metrics
  • Model token usage (input/output/total per model)
  • Real-time event counting (model calls, tool calls)
  • Eval artifacts: per-sample results JSON + full eval log JSON

Tracing Hook

Activated when MLFLOW_INSPECT_TRACING=true is also set. Maps eval execution to MLflow trace spans.

eval_run:6fvmKSZv (CHAIN)
  task:task (CHAIN)
    sample:gM9UtEAM (CHAIN)
      solvers -> generate -> model:openai/gpt-4o-mini (LLM)
      scorers -> match -> score (EVALUATOR)
    sample:628Qbuhr (CHAIN)
      ...

Each span captures relevant data:

Span Type Data
LLM model name, token counts, temperature, cache, response
TOOL function name, arguments, result, errors
EVALUATOR score value, explanation, target

Screenshots

Traces list showing an eval run with execution time and status:

Traces list

Full span tree showing the eval hierarchy (eval_run -> task -> samples -> solvers/scorers):

Span tree

LLM span detail with model name, token counts, and response text:

LLM detail

Configuration

Env var Required Default Description
MLFLOW_TRACKING_URI Yes - MLflow server URL
MLFLOW_EXPERIMENT_NAME No inspect_ai Experiment name
MLFLOW_INSPECT_TRACING No false Enable execution tracing
MLFLOW_INSPECT_LOG_ARTIFACTS No true Log eval artifacts

Example

from inspect_ai import Task, eval
from inspect_ai.dataset import Sample
from inspect_ai.scorer import match
from inspect_ai.solver import generate

# No special imports needed. Hooks auto-register on install.

task = Task(
    dataset=[
        Sample(input="What is 2 + 2?", target="4"),
        Sample(input="What is 3 * 5?", target="15"),
        Sample(input="What is 10 - 7?", target="3"),
    ],
    solver=generate(),
    scorer=match(),
)

logs = eval(task, model="openai/gpt-4o-mini")
# Results are now in MLflow: runs with metrics + traces with spans

Development

git clone https://github.com/debu-sinha/inspect-mlflow.git
cd inspect-mlflow
uv sync --group dev
uv run pre-commit install
uv run pytest tests/ -v

See CONTRIBUTING.md for details.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

inspect_mlflow-0.1.0.tar.gz (971.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

inspect_mlflow-0.1.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file inspect_mlflow-0.1.0.tar.gz.

File metadata

  • Download URL: inspect_mlflow-0.1.0.tar.gz
  • Upload date:
  • Size: 971.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inspect_mlflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8988427269fb365e85ed8e5c7b2b5989d6a8a2448b16e928c3c8e040bf565ce8
MD5 bd1faea76789992c13078f044a99278d
BLAKE2b-256 c4585db6e6c48058e7bcbef2b30a41af192037c343f3eefdac7318b144ae3e2f

See more details on using hashes here.

File details

Details for the file inspect_mlflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: inspect_mlflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for inspect_mlflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aaadb20c2f7ceed1d97d8fc2b7d3d344bdf68d6521c908ff04d8e7de1f27fa58
MD5 5ac1e6534ac369965f95d792deeb26b2
BLAKE2b-256 22dd042c4fd7b0fa3bcb1c0ae6c3077754658a54543c1e8c70d160d3c3388142

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page