Skip to main content

AgentOps CLI for standardized evaluation workflows

Project description

AgentOps Toolkit

AgentOps CLI for evaluation, observability, and operational workflows for Microsoft Foundry Agents and Models.

License: MIT Status: Preview Python 3.11+ CLI Built on Microsoft Foundry

Overview

AgentOps Toolkit is a CLI built on Microsoft Foundry that standardizes evaluation and operational workflows for AI agents and models, helping teams run, monitor, and automate AgentOps processes.

The project enables:

  • Consistent local and CI execution of agent evaluations
  • Reusable evaluation policies through bundles
  • Operational observability through tracing, monitoring, and run inspection
  • Stable machine-readable outputs for automation
  • Human-readable reports for PR reviews and quality gates

Operational capabilities include:

  • Standardized evaluation workflows
  • Run history and result inspection
  • Tracing and observability
  • Monitoring (dashboards and alerts)
  • CI/CD automation
  • Operational reporting and analysis

Core outputs:

  • results.json (machine-readable)
  • report.md (human-readable)

Exit code contract:

  • 0 execution succeeded and all thresholds passed
  • 2 execution succeeded but one or more thresholds failed
  • 1 runtime or configuration error

Quickstart

This section is structured for demos and onboarding, so you can present the project flow end-to-end in a few minutes.

Quickstart demo: agentops init and eval run

1) Install

python -m venv .venv
# activate your venv in the current shell
python -m pip install -U pip
python -m pip install agentops-toolkit

2) Initialize Workspace

agentops init

Generated structure:

.agentops/
├── config.yaml
├── run.yaml
├── run-rag.yaml
├── run-agent.yaml
├── .gitignore
├── bundles/
│   ├── model_direct_baseline.yaml
│   ├── rag_retrieval_baseline.yaml
│   └── agent_tools_baseline.yaml
├── datasets/
│   ├── smoke-model-direct.yaml
│   ├── smoke-rag.yaml
│   └── smoke-agent-tools.yaml
├── data/
│   ├── smoke-model-direct.jsonl
│   ├── smoke-rag.jsonl
│   └── smoke-agent-tools.jsonl
└── results/

3) Configure Foundry Endpoint

PowerShell:

$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"

Bash/zsh:

export AZURE_AI_FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"

Authentication uses DefaultAzureCredential:

  • local: az login
  • CI/CD: service principal env vars
  • Azure-hosted: managed identity

4) Choose Scenario Run Config

Starter run files created by agentops init:

  • .agentops/run.yaml (default model-direct)
  • .agentops/run-rag.yaml (agent + rag baseline)
  • .agentops/run-agent.yaml (agent + tools baseline)

Important:

  • Replace placeholders (backend.model, backend.agent_id) with values that exist in your Foundry project.
  • There is no universal deployment name guaranteed across all Foundry projects/regions.

5) Run Evaluation

agentops eval run

Or run a specific scenario file:

agentops eval run --config .agentops/run-rag.yaml
agentops eval run --config .agentops/run-agent.yaml

Default behavior:

  • input config: .agentops/run.yaml
  • output location: timestamped folder under .agentops/results/
  • latest pointer: .agentops/results/latest/

6) Regenerate Report (Optional)

agentops report

Default input:

  • .agentops/results/latest/results.json

Evaluation Scenarios

Starter bundles created by agentops init:

Bundle Evaluators Typical use
model_direct_baseline (default) SimilarityEvaluator + avg_latency_seconds Model-direct QA checks
rag_retrieval_baseline GroundednessEvaluator + avg_latency_seconds RAG groundedness checks
agent_tools_baseline SimilarityEvaluator + avg_latency_seconds Agent-with-tools baseline (placeholder)

datasets/ stores YAML dataset definitions. data/ stores JSONL rows referenced by dataset definitions.

Commands

Command Line Reference

Command Description Status
agentops --version Show installed version
agentops init [--path DIR] Scaffold project workspace and starter files
agentops eval run Evaluate a dataset against a bundle
agentops eval compare --runs ID1,ID2 Compare two past runs
agentops run list|show List or inspect past runs 🚧
agentops run view <id> [--entry N] Deep run inspection 🚧
agentops report Regenerate report.md from results.json
agentops report show|export View/export reports 🚧
agentops bundle list|show Browse bundle catalog 🚧
agentops dataset validate|describe|import Dataset utilities 🚧
agentops config cicd Generate GitHub Actions workflow for CI evaluation
agentops config validate|show Config validation and inspection 🚧
agentops trace init Tracing setup 🚧
agentops monitor setup|dashboard|alert Monitoring operations 🚧
agentops model list List Foundry model deployments 🚧
agentops agent list List Foundry agents 🚧

Implemented command usage:

agentops --version
agentops init [--path <dir>]
agentops eval run [--config <path>] [--output <dir>]
agentops report [--in <results.json>] [--out <report.md>]
agentops config cicd [--force] [--dir <path>]

For planned commands, the CLI returns a friendly message indicating the command is planned but not implemented in this release.

Project Structure

High-level code layout:

  • src/agentops/cli/ command entrypoints (Typer)
  • src/agentops/services/ orchestration workflows
  • src/agentops/backends/ execution engines (foundry, subprocess)
  • src/agentops/core/ schemas, thresholds, and report generation
  • src/agentops/templates/ starter workspace assets
  • tests/unit/ and tests/integration/ automated tests

Documentation

GitHub Copilot Skills

AgentOps publishes Copilot skills that teach GitHub Copilot how to use the evaluation CLI correctly. Install them from this repository to get AI-assisted guidance for running evaluations, investigating regressions, and triage workflows.

Available Skills

Skill Description
agentops-run-evals Guides evaluation workflow — init, run, report, compare
agentops-investigate-regression Regression investigation — metric deltas, threshold flips, actionable checks
agentops-observability-triage Observability and triage — current capabilities vs planned features

Installation

Skills are distributed from this GitHub repository. Install them in VS Code:

  1. Open VS Code with GitHub Copilot Chat enabled.
  2. Use the Copilot skill install command and point to this repository:
    • Source: Azure/agentops
    • Skills are located under .github/plugins/agentops/skills/
  3. Once installed, Copilot will automatically use the skills when you ask about AgentOps evaluation, regressions, or observability.

Alternatively, you can copy the skill files manually:

# Copy skills to your user-level skills directory
cp -r .github/plugins/agentops/skills/* ~/.agents/skills/

For Repository Contributors

If you're working inside this repo, the skills under .github/skills/ are automatically available to Copilot when the repository is your active workspace.

Contributing

See CONTRIBUTING.md for architecture rules, testing expectations, and contribution workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentops_toolkit-0.1.3.tar.gz (6.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentops_toolkit-0.1.3-py3-none-any.whl (64.2 kB view details)

Uploaded Python 3

File details

Details for the file agentops_toolkit-0.1.3.tar.gz.

File metadata

  • Download URL: agentops_toolkit-0.1.3.tar.gz
  • Upload date:
  • Size: 6.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentops_toolkit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 38431f1dc8e80222174e2d0b1c6236a1796db751c912acd1711c2d7e7a86c37f
MD5 7b65b63a43acd8bd120a9fc13bd12613
BLAKE2b-256 92f5bc048264e2855faef5f8edf97514859bfab86d66a7dd55e23c28f3bb8d21

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.3.tar.gz:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agentops_toolkit-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for agentops_toolkit-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 53ee09dcb396c945c14e3154a1109f08a19a098190d0998c6d85a9b897ad06a3
MD5 ddd7394a4e148b68107459f2f34cb3ce
BLAKE2b-256 85187df04b6c26a8e6f9e8c8a832bc995195fea3954f1364e44de574363e5272

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.3-py3-none-any.whl:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page