AgentOps CLI for standardized evaluation workflows

Project description

AgentOps Toolkit

AgentOps CLI for evaluation, observability, and operational workflows for Microsoft Foundry Agents and Models.

Status: Preview Python 3.11+ CLI Built on Microsoft Foundry

Overview

AgentOps Toolkit is a CLI built on Microsoft Foundry that standardizes evaluation and operational workflows for AI agents and models, helping teams run, monitor, and automate AgentOps processes.

The project enables:

Consistent local and CI execution of agent evaluations
Reusable evaluation policies through bundles
Operational observability through tracing, monitoring, and run inspection
Stable machine-readable outputs for automation
Human-readable reports for PR reviews and quality gates

Operational capabilities include:

Standardized evaluation workflows
Run history and result inspection
Tracing and observability
Monitoring (dashboards and alerts)
CI/CD automation
Operational reporting and analysis

Core outputs:

results.json (machine-readable)
report.md (human-readable)

Exit code contract:

0 execution succeeded and all thresholds passed
2 execution succeeded but one or more thresholds failed
1 runtime or configuration error

Quickstart

This section is structured for demos and onboarding, so you can present the project flow end-to-end in a few minutes.

Quickstart demo: agentops init and eval run

1) Install

python -m venv .venv
# activate your venv in the current shell
python -m pip install -U pip
python -m pip install agentops-toolkit

2) Initialize Workspace

agentops init

Generated structure:

.agentops/
├── config.yaml
├── run.yaml
├── run-rag.yaml
├── run-agent.yaml
├── .gitignore
├── bundles/
│   ├── model_direct_baseline.yaml
│   ├── rag_retrieval_baseline.yaml
│   └── agent_tools_baseline.yaml
├── datasets/
│   ├── smoke-model-direct.yaml
│   ├── smoke-rag.yaml
│   └── smoke-agent-tools.yaml
├── data/
│   ├── smoke-model-direct.jsonl
│   ├── smoke-rag.jsonl
│   └── smoke-agent-tools.jsonl
└── results/

3) Configure Foundry Endpoint

PowerShell:

$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"

Bash/zsh:

export AZURE_AI_FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"

Authentication uses DefaultAzureCredential:

local: az login
CI/CD: service principal env vars
Azure-hosted: managed identity

4) Choose Scenario Run Config

Starter run files created by agentops init:

.agentops/run.yaml (default model-direct)
.agentops/run-rag.yaml (agent + rag baseline)
.agentops/run-agent.yaml (agent + tools baseline)

Important:

Replace placeholders (backend.model, backend.agent_id) with values that exist in your Foundry project.
There is no universal deployment name guaranteed across all Foundry projects/regions.

5) Run Evaluation

agentops eval run

Or run a specific scenario file:

agentops eval run --config .agentops/run-rag.yaml
agentops eval run --config .agentops/run-agent.yaml

Default behavior:

input config: .agentops/run.yaml
output location: timestamped folder under .agentops/results/
latest pointer: .agentops/results/latest/

6) Regenerate Report (Optional)

agentops report

Default input:

.agentops/results/latest/results.json

Evaluation Scenarios

Starter bundles created by agentops init:

Bundle	Evaluators	Typical use
`model_direct_baseline` (default)	`SimilarityEvaluator` + `avg_latency_seconds`	Model-direct QA checks
`rag_retrieval_baseline`	`GroundednessEvaluator` + `avg_latency_seconds`	RAG groundedness checks
`agent_tools_baseline`	`TaskCompletionEvaluator` + `ToolCallAccuracyEvaluator` + `avg_latency_seconds`	Agent-with-tools baseline

datasets/ stores YAML dataset definitions. data/ stores JSONL rows referenced by dataset definitions.

Commands

Command Line Reference

Command	Description	Status
`agentops --version`	Show installed version	✅
`agentops init [--dir DIR]`	Scaffold project workspace and starter files	✅
`agentops eval run`	Evaluate a dataset against a bundle	✅
`agentops eval compare --runs ID1,ID2`	Compare two past runs	✅
`agentops run list\|show`	List or inspect past runs	🚧
`agentops run view <id> [--entry N]`	Deep run inspection	🚧
`agentops report`	Regenerate `report.md` from `results.json`	✅
`agentops report show\|export`	View/export reports	🚧
`agentops bundle list\|show`	Browse bundle catalog	🚧
`agentops dataset validate\|describe\|import`	Dataset utilities	🚧
`agentops config cicd`	Generate GitHub Actions workflow for CI evaluation	✅
`agentops config validate\|show`	Config validation and inspection	🚧
`agentops trace init`	Tracing setup	🚧
`agentops monitor setup\|dashboard\|alert`	Monitoring operations	🚧
`agentops model list`	List Foundry model deployments	🚧
`agentops agent list`	List Foundry agents	🚧

Implemented command usage:

agentops --version
agentops init [--dir <dir>]
agentops eval run [--config <path>] [--output <dir>] [--format md|html|all]
agentops eval compare --runs ID1,ID2 [--output <dir>] [--format md|html|all]
agentops report [--in <results.json>] [--out <report.md>] [--format md|html|all]
agentops config cicd [--force] [--dir <path>]

For planned commands, the CLI returns a friendly message indicating the command is planned but not implemented in this release.

Project Structure

High-level code layout:

src/agentops/cli/ command entrypoints (Typer)
src/agentops/services/ orchestration workflows
src/agentops/backends/ execution engines (foundry, subprocess)
src/agentops/core/ schemas, thresholds, and report generation
src/agentops/templates/ starter workspace assets
tests/unit/ and tests/integration/ automated tests

Documentation

Architecture and request flow: docs/how-it-works.md
Foundry agent tutorial: docs/tutorial-basic-foundry-agent.md
Model-direct tutorial: docs/tutorial-model-direct.md
RAG tutorial: docs/tutorial-rag.md
Baseline comparison tutorial: docs/tutorial-baseline-comparison.md
Copilot skills installation: docs/tutorial-copilot-skills.md
Built-in evaluator notes: docs/foundry-evaluation-sdk-built-in-evaluators.md
CI/CD setup guide: docs/ci-github-actions.md

GitHub Copilot Skills

AgentOps publishes Copilot skills that teach GitHub Copilot how to use the evaluation CLI correctly. Install them from this repository to get AI-assisted guidance for running evaluations, investigating regressions, and triage workflows.

Available Skills

Skill	Description
`agentops-run-evals`	Guides evaluation workflow — init, run, report, compare
`agentops-investigate-regression`	Regression investigation — metric deltas, threshold flips, actionable checks
`agentops-observability-triage`	Observability and triage — current capabilities vs planned features

Installation

Skills are distributed from this GitHub repository. Install them in VS Code:

Open VS Code with GitHub Copilot Chat enabled.
Use the Copilot skill install command and point to this repository:
- Source: Azure/agentops
  - Skills are located under plugins/agentops/skills/
Once installed, Copilot will automatically use the skills when you ask about AgentOps evaluation, regressions, or observability.

Alternatively, you can copy the skill files manually:

# Copy skills to your user-level skills directory
cp -r plugins/agentops/skills/* ~/.agents/skills/

For Repository Contributors

If you're working inside this repo, the skills under .github/skills/ are automatically available to Copilot when the repository is your active workspace.

Contributing

See CONTRIBUTING.md for architecture rules, testing expectations, and contribution workflow.

Project details

Release history Release notifications | RSS feed

0.1.7

Apr 15, 2026

0.1.6

Apr 15, 2026

0.1.5

Apr 14, 2026

This version

0.1.4

Apr 14, 2026

0.1.3

Mar 25, 2026

0.1.2

Mar 9, 2026

0.1.1

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentops_toolkit-0.1.4.tar.gz (6.8 MB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentops_toolkit-0.1.4-py3-none-any.whl (75.8 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file agentops_toolkit-0.1.4.tar.gz.

File metadata

Download URL: agentops_toolkit-0.1.4.tar.gz
Upload date: Apr 14, 2026
Size: 6.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentops_toolkit-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`68d7cc7250d62c05c3cc9eb89a47c4ca078670dfab804664a4f45bbed9ef112d`
MD5	`07cc72ad81e64e54ce9cd0385ed0095c`
BLAKE2b-256	`dcfe8e92bebef28b4c50f26ad608fe8b78d4d3702df7a9b18c520b7fac60849d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.4.tar.gz:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentops_toolkit-0.1.4.tar.gz
- Subject digest: 68d7cc7250d62c05c3cc9eb89a47c4ca078670dfab804664a4f45bbed9ef112d
- Sigstore transparency entry: 1291216527
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: Azure/agentops@48cbab0952eee17f285062678bb3444e1a54868f
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/Azure
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@48cbab0952eee17f285062678bb3444e1a54868f
- Trigger Event: push

File details

Details for the file agentops_toolkit-0.1.4-py3-none-any.whl.

File metadata

Download URL: agentops_toolkit-0.1.4-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 75.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agentops_toolkit-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b3a4d1e6c0224aebcdbdb508f51e82fd984f4ec97a980f7ef8cc0afbdb6b3019`
MD5	`f3a55a14529199995c576c244b7e5da0`
BLAKE2b-256	`3580042df2375cf85b0762154a2c1da5b5cc2ad95c9fa6c61de8c17b67a718a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentops_toolkit-0.1.4-py3-none-any.whl:

Publisher: release.yml on Azure/agentops

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentops_toolkit-0.1.4-py3-none-any.whl
- Subject digest: b3a4d1e6c0224aebcdbdb508f51e82fd984f4ec97a980f7ef8cc0afbdb6b3019
- Sigstore transparency entry: 1291216613
- Sigstore integration time: Apr 14, 2026
Source repository:
- Permalink: Azure/agentops@48cbab0952eee17f285062678bb3444e1a54868f
- Branch / Tag: refs/tags/v0.1.4
- Owner: https://github.com/Azure
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@48cbab0952eee17f285062678bb3444e1a54868f
- Trigger Event: push

agentops-toolkit 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

AgentOps Toolkit

Overview

Quickstart

1) Install

2) Initialize Workspace

3) Configure Foundry Endpoint

4) Choose Scenario Run Config

5) Run Evaluation

6) Regenerate Report (Optional)

Evaluation Scenarios

Commands

Command Line Reference

Project Structure

Documentation

GitHub Copilot Skills

Available Skills

Installation

For Repository Contributors

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance