AgentOps CLI for standardized evaluation workflows
Project description
AgentOps Toolkit
AgentOps CLI for evaluation, observability, and operational workflows for Microsoft Foundry Agents and Models.
Overview
AgentOps Toolkit is a CLI built on Microsoft Foundry that standardizes evaluation and operational workflows for AI agents and models, helping teams run, monitor, and automate AgentOps processes.
The project enables:
- Consistent local and CI execution of agent evaluations
- Reusable evaluation policies through bundles
- Operational observability through tracing, monitoring, and run inspection
- Stable machine-readable outputs for automation
- Human-readable reports for PR reviews and quality gates
Operational capabilities include:
- Standardized evaluation workflows
- Run history and result inspection
- Tracing and observability
- Monitoring (dashboards and alerts)
- CI/CD automation
- Operational reporting and analysis
Core outputs:
results.json(machine-readable)report.md(human-readable)
Exit code contract:
0execution succeeded and all thresholds passed2execution succeeded but one or more thresholds failed1runtime or configuration error
Quickstart
This section is structured for demos and onboarding, so you can present the project flow end-to-end in a few minutes.
1) Install
python -m venv .venv
# activate your venv in the current shell
python -m pip install -U pip
python -m pip install agentops-toolkit
2) Initialize Workspace
agentops init
Generated structure:
.agentops/
├── config.yaml
├── run.yaml
├── run-rag.yaml
├── run-agent.yaml
├── .gitignore
├── bundles/
│ ├── model_direct_baseline.yaml
│ ├── rag_retrieval_baseline.yaml
│ └── agent_tools_baseline.yaml
├── datasets/
│ ├── smoke-model-direct.yaml
│ ├── smoke-rag.yaml
│ └── smoke-agent-tools.yaml
├── data/
│ ├── smoke-model-direct.jsonl
│ ├── smoke-rag.jsonl
│ └── smoke-agent-tools.jsonl
└── results/
3) Configure Foundry Endpoint
PowerShell:
$env:AZURE_AI_FOUNDRY_PROJECT_ENDPOINT = "https://<resource>.services.ai.azure.com/api/projects/<project>"
Bash/zsh:
export AZURE_AI_FOUNDRY_PROJECT_ENDPOINT="https://<resource>.services.ai.azure.com/api/projects/<project>"
Authentication uses DefaultAzureCredential:
- local:
az login - CI/CD: service principal env vars
- Azure-hosted: managed identity
4) Choose Scenario Run Config
Starter run files created by agentops init:
.agentops/run.yaml(default model-direct).agentops/run-rag.yaml(agent + rag baseline).agentops/run-agent.yaml(agent + tools baseline)
Important:
- Replace placeholders (
backend.model,backend.agent_id) with values that exist in your Foundry project. - There is no universal deployment name guaranteed across all Foundry projects/regions.
5) Run Evaluation
agentops eval run
Or run a specific scenario file:
agentops eval run --config .agentops/run-rag.yaml
agentops eval run --config .agentops/run-agent.yaml
Default behavior:
- input config:
.agentops/run.yaml - output location: timestamped folder under
.agentops/results/ - latest pointer:
.agentops/results/latest/
6) Regenerate Report (Optional)
agentops report
Default input:
.agentops/results/latest/results.json
Evaluation Scenarios
Starter bundles created by agentops init:
| Bundle | Evaluators | Typical use |
|---|---|---|
model_direct_baseline (default) |
SimilarityEvaluator + avg_latency_seconds |
Model-direct QA checks |
rag_retrieval_baseline |
GroundednessEvaluator + avg_latency_seconds |
RAG groundedness checks |
agent_tools_baseline |
SimilarityEvaluator + avg_latency_seconds |
Agent-with-tools baseline (placeholder) |
datasets/ stores YAML dataset definitions.
data/ stores JSONL rows referenced by dataset definitions.
Commands
Command Line Reference
| Command | Description | Status |
|---|---|---|
agentops --version |
Show installed version | ✅ |
agentops init [--path DIR] |
Scaffold project workspace and starter files | ✅ |
agentops eval run |
Evaluate a dataset against a bundle | ✅ |
agentops eval compare --runs ID1,ID2 |
Compare two past runs | ✅ |
agentops run list|show |
List or inspect past runs | 🚧 |
agentops run view <id> [--entry N] |
Deep run inspection | 🚧 |
agentops report |
Regenerate report.md from results.json |
✅ |
agentops report show|export |
View/export reports | 🚧 |
agentops bundle list|show |
Browse bundle catalog | 🚧 |
agentops dataset validate|describe|import |
Dataset utilities | 🚧 |
agentops config cicd |
Generate GitHub Actions workflow for CI evaluation | ✅ |
agentops config validate|show |
Config validation and inspection | 🚧 |
agentops trace init |
Tracing setup | 🚧 |
agentops monitor setup|dashboard|alert |
Monitoring operations | 🚧 |
agentops model list |
List Foundry model deployments | 🚧 |
agentops agent list |
List Foundry agents | 🚧 |
Implemented command usage:
agentops --version
agentops init [--path <dir>]
agentops eval run [--config <path>] [--output <dir>]
agentops report [--in <results.json>] [--out <report.md>]
agentops config cicd [--force] [--dir <path>]
For planned commands, the CLI returns a friendly message indicating the command is planned but not implemented in this release.
Project Structure
High-level code layout:
src/agentops/cli/command entrypoints (Typer)src/agentops/services/orchestration workflowssrc/agentops/backends/execution engines (foundry,subprocess)src/agentops/core/schemas, thresholds, and report generationsrc/agentops/templates/starter workspace assetstests/unit/andtests/integration/automated tests
Documentation
- Architecture and request flow: docs/how-it-works.md
- Foundry agent tutorial: docs/tutorial-basic-foundry-agent.md
- Model-direct tutorial: docs/tutorial-model-direct.md
- RAG tutorial: docs/tutorial-rag.md
- Baseline comparison tutorial: docs/tutorial-baseline-comparison.md
- Copilot skills installation: docs/tutorial-copilot-skills.md
- Built-in evaluator notes: docs/foundry-evaluation-sdk-built-in-evaluators.md
- CI/CD setup guide: docs/ci-github-actions.md
GitHub Copilot Skills
AgentOps publishes Copilot skills that teach GitHub Copilot how to use the evaluation CLI correctly. Install them from this repository to get AI-assisted guidance for running evaluations, investigating regressions, and triage workflows.
Available Skills
| Skill | Description |
|---|---|
agentops-run-evals |
Guides evaluation workflow — init, run, report, compare |
agentops-investigate-regression |
Regression investigation — metric deltas, threshold flips, actionable checks |
agentops-observability-triage |
Observability and triage — current capabilities vs planned features |
Installation
Skills are distributed from this GitHub repository. Install them in VS Code:
- Open VS Code with GitHub Copilot Chat enabled.
- Use the Copilot skill install command and point to this repository:
- Source:
Azure/agentops - Skills are located under
.github/plugins/agentops/skills/
- Source:
- Once installed, Copilot will automatically use the skills when you ask about AgentOps evaluation, regressions, or observability.
Alternatively, you can copy the skill files manually:
# Copy skills to your user-level skills directory
cp -r .github/plugins/agentops/skills/* ~/.agents/skills/
For Repository Contributors
If you're working inside this repo, the skills under .github/skills/ are automatically available to Copilot when the repository is your active workspace.
Contributing
See CONTRIBUTING.md for architecture rules, testing expectations, and contribution workflow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentops_toolkit-0.1.3.tar.gz.
File metadata
- Download URL: agentops_toolkit-0.1.3.tar.gz
- Upload date:
- Size: 6.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
38431f1dc8e80222174e2d0b1c6236a1796db751c912acd1711c2d7e7a86c37f
|
|
| MD5 |
7b65b63a43acd8bd120a9fc13bd12613
|
|
| BLAKE2b-256 |
92f5bc048264e2855faef5f8edf97514859bfab86d66a7dd55e23c28f3bb8d21
|
Provenance
The following attestation bundles were made for agentops_toolkit-0.1.3.tar.gz:
Publisher:
release.yml on Azure/agentops
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentops_toolkit-0.1.3.tar.gz -
Subject digest:
38431f1dc8e80222174e2d0b1c6236a1796db751c912acd1711c2d7e7a86c37f - Sigstore transparency entry: 1177908932
- Sigstore integration time:
-
Permalink:
Azure/agentops@b679a3f6877e2d5e3a191159674a5b2b770244a8 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Azure
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b679a3f6877e2d5e3a191159674a5b2b770244a8 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentops_toolkit-0.1.3-py3-none-any.whl.
File metadata
- Download URL: agentops_toolkit-0.1.3-py3-none-any.whl
- Upload date:
- Size: 64.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
53ee09dcb396c945c14e3154a1109f08a19a098190d0998c6d85a9b897ad06a3
|
|
| MD5 |
ddd7394a4e148b68107459f2f34cb3ce
|
|
| BLAKE2b-256 |
85187df04b6c26a8e6f9e8c8a832bc995195fea3954f1364e44de574363e5272
|
Provenance
The following attestation bundles were made for agentops_toolkit-0.1.3-py3-none-any.whl:
Publisher:
release.yml on Azure/agentops
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentops_toolkit-0.1.3-py3-none-any.whl -
Subject digest:
53ee09dcb396c945c14e3154a1109f08a19a098190d0998c6d85a9b897ad06a3 - Sigstore transparency entry: 1177908937
- Sigstore integration time:
-
Permalink:
Azure/agentops@b679a3f6877e2d5e3a191159674a5b2b770244a8 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/Azure
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b679a3f6877e2d5e3a191159674a5b2b770244a8 -
Trigger Event:
push
-
Statement type: