Skip to main content

AgentOps CLI for standardized evaluation workflows

Project description

AgentOps Toolkit

AgentOps CLI for evaluation, observability, and operational workflows for Microsoft Foundry Agents and Models.

License: MIT Status: Preview Python 3.11+ CLI Built on Microsoft Foundry

Overview

AgentOps Toolkit is a CLI built on Microsoft Foundry that standardizes evaluation and operational workflows for AI agents and models, helping teams run, monitor, and automate AgentOps processes.

The project enables:

  • Consistent local and CI execution of agent evaluations
  • Reusable evaluation policies through bundles
  • Operational observability through tracing, monitoring, and run inspection
  • Stable machine-readable outputs for automation
  • Human-readable reports for PR reviews and quality gates

Operational capabilities include:

  • Standardized evaluation workflows
  • Run history and result inspection
  • Tracing and observability
  • Monitoring (dashboards and alerts)
  • CI/CD automation
  • Operational reporting and analysis

Core outputs:

  • results.json (machine-readable)
  • report.md (human-readable)

Exit code contract:

  • 0 execution succeeded and all thresholds passed
  • 2 execution succeeded but one or more thresholds failed
  • 1 runtime or configuration error

Quickstart

This section is structured for demos and onboarding, so you can present the project flow end-to-end in a few minutes.

Quickstart demo: agentops init and eval run

1) Install

python -m venv .venv
# activate your venv in the current shell
python -m pip install -U pip
python -m pip install agentops-toolkit

2) Initialize Workspace

agentops init

Generated structure:

.agentops/
├── config.yaml
├── run.yaml
├── run-rag.yaml
├── run-agent.yaml
├── .gitignore
├── bundles/
│   ├── model_direct_baseline.yaml
│   ├── rag_retrieval_baseline.yaml
│   └── agent_tools_baseline.yaml
├── datasets/
│   ├── smoke-model-direct.yaml
│   ├── smoke-rag.yaml
│   └── smoke-agent-tools.yaml
├── data/
│   ├── smoke-model-direct.jsonl
│   ├── smoke-rag.jsonl
│   └── smoke-agent-tools.jsonl
└── results/

3) Configure Foundry Endpoint

PowerShell:

$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"

Bash/zsh:

export AZURE_AI_FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"

Authentication uses DefaultAzureCredential:

  • local: az login
  • CI/CD: service principal env vars
  • Azure-hosted: managed identity

4) Choose Scenario Run Config

Starter run files created by agentops init:

  • .agentops/run.yaml (default model-direct)
  • .agentops/run-rag.yaml (agent + rag baseline)
  • .agentops/run-agent.yaml (agent + tools baseline)

Important:

  • Replace placeholders (backend.model, backend.agent_id) with values that exist in your Foundry project.
  • There is no universal deployment name guaranteed across all Foundry projects/regions.

5) Run Evaluation

agentops eval run

Or run a specific scenario file:

agentops eval run --config .agentops/run-rag.yaml
agentops eval run --config .agentops/run-agent.yaml

Default behavior:

  • input config: .agentops/run.yaml
  • output location: timestamped folder under .agentops/results/
  • latest pointer: .agentops/results/latest/

6) Regenerate Report (Optional)

agentops report

Default input:

  • .agentops/results/latest/results.json

Evaluation Scenarios

Starter bundles created by agentops init:

Bundle Evaluators Typical use
model_direct_baseline (default) SimilarityEvaluator + avg_latency_seconds Model-direct QA checks
rag_retrieval_baseline GroundednessEvaluator + avg_latency_seconds RAG groundedness checks
agent_tools_baseline TaskCompletionEvaluator + ToolCallAccuracyEvaluator + avg_latency_seconds Agent-with-tools baseline

datasets/ stores YAML dataset definitions. data/ stores JSONL rows referenced by dataset definitions.

Commands

Command Line Reference

Command Description Status
agentops --version Show installed version
agentops init [--dir DIR] Scaffold project workspace and starter files
agentops eval run Evaluate a dataset against a bundle
agentops eval compare --runs ID1,ID2 Compare two past runs
agentops run list|show List or inspect past runs 🚧
agentops run view <id> [--entry N] Deep run inspection 🚧
agentops report Regenerate report.md from results.json
agentops report show|export View/export reports 🚧
agentops bundle list|show Browse bundle catalog 🚧
agentops dataset validate|describe|import Dataset utilities 🚧
agentops config cicd Generate GitHub Actions workflow for CI evaluation
agentops config validate|show Config validation and inspection 🚧
agentops trace init Tracing setup 🚧
agentops monitor setup|dashboard|alert Monitoring operations 🚧
agentops model list List Foundry model deployments 🚧
agentops agent list List Foundry agents 🚧

Implemented command usage:

agentops --version
agentops init [--dir <dir>]
agentops eval run [--config <path>] [--output <dir>] [--format md|html|all]
agentops eval compare --runs ID1,ID2 [--output <dir>] [--format md|html|all]
agentops report [--in <results.json>] [--out <report.md>] [--format md|html|all]
agentops config cicd [--force] [--dir <path>]

For planned commands, the CLI returns a friendly message indicating the command is planned but not implemented in this release.

Project Structure

High-level code layout:

  • src/agentops/cli/ command entrypoints (Typer)
  • src/agentops/services/ orchestration workflows
  • src/agentops/backends/ execution engines (foundry, subprocess)
  • src/agentops/core/ schemas, thresholds, and report generation
  • src/agentops/templates/ starter workspace assets
  • tests/unit/ and tests/integration/ automated tests

Documentation

GitHub Copilot Skills

AgentOps publishes Copilot skills that teach GitHub Copilot how to use the evaluation CLI correctly. Install them from this repository to get AI-assisted guidance for running evaluations, investigating regressions, and triage workflows.

Available Skills

Skill Description
agentops-run-evals Guides evaluation workflow — init, run, report, compare
agentops-investigate-regression Regression investigation — metric deltas, threshold flips, actionable checks
agentops-observability-triage Observability and triage — current capabilities vs planned features

Installation

Skills are distributed from this GitHub repository. Install them in VS Code:

  1. Open VS Code with GitHub Copilot Chat enabled.
  2. Use the Copilot skill install command and point to this repository:
    • Source: Azure/agentops
      • Skills are located under plugins/agentops/skills/
  3. Once installed, Copilot will automatically use the skills when you ask about AgentOps evaluation, regressions, or observability.

Alternatively, you can copy the skill files manually:

# Copy skills to your user-level skills directory
cp -r plugins/agentops/skills/* ~/.agents/skills/

For Repository Contributors

If you're working inside this repo, the skills under .github/skills/ are automatically available to Copilot when the repository is your active workspace.

Contributing

See CONTRIBUTING.md for architecture rules, testing expectations, and contribution workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentops_toolkit-0.1.4.tar.gz (6.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentops_toolkit-0.1.4-py3-none-any.whl (75.8 kB view details)

Uploaded Python 3

File details

Details for the file agentops_toolkit-0.1.4.tar.gz.

File metadata

  • Download URL: agentops_toolkit-0.1.4.tar.gz
  • Upload date:
  • Size: 6.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentops_toolkit-0.1.4.tar.gz
Algorithm Hash digest
SHA256 68d7cc7250d62c05c3cc9eb89a47c4ca078670dfab804664a4f45bbed9ef112d
MD5 07cc72ad81e64e54ce9cd0385ed0095c
BLAKE2b-256 dcfe8e92bebef28b4c50f26ad608fe8b78d4d3702df7a9b18c520b7fac60849d

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.4.tar.gz:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentops_toolkit-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for agentops_toolkit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b3a4d1e6c0224aebcdbdb508f51e82fd984f4ec97a980f7ef8cc0afbdb6b3019
MD5 f3a55a14529199995c576c244b7e5da0
BLAKE2b-256 3580042df2375cf85b0762154a2c1da5b5cc2ad95c9fa6c61de8c17b67a718a8

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.4-py3-none-any.whl:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page