Reliability infrastructure for AI agents — evaluation, observability, and regression testing
Project description
CortexOps
Reliability infrastructure for AI agents.
Evaluate · Observe · Operate — for LangGraph, CrewAI, and AutoGen.
The problem
You deployed an agent. You have no idea if it regressed overnight.
No standard eval format. No failure traces. No CI gate before the next prompt change ships.
CortexOps fixes that.
Install
pip install cortexops
# With HTTP client (for pushing traces to hosted API):
pip install cortexops[http]
# With LLM judge support:
pip install cortexops[llm]
Quickstart
from cortexops import CortexTracer, EvalSuite
# Wrap your LangGraph app — zero refactor required
tracer = CortexTracer(project="payments-agent")
graph = tracer.wrap(your_langgraph_app)
# Run evaluations against a golden dataset
results = EvalSuite.run(
dataset="golden_v1.yaml",
agent=graph,
)
print(results.summary())
Golden dataset (YAML)
version: 1
project: payments-agent
cases:
- id: refund_lookup_01
input: "What is the status of refund REF-8821?"
expected_tool_calls: [lookup_refund]
expected_output_contains: ["approved", "REF-8821"]
max_latency_ms: 3000
- id: open_ended_explanation_01
input: "Why was my refund rejected?"
judge: llm
judge_criteria: >
The response must explain the rejection reason clearly,
be empathetic, and offer a concrete next step. No jargon.
CI gate
cortexops eval run \
--dataset golden_v1.yaml \
--fail-on "task_completion < 0.90"
Exits non-zero if the threshold is not met — blocks the PR.
Built-in metrics
| Metric | What it checks |
|---|---|
task_completion |
Non-empty, non-error output with expected content |
tool_accuracy |
Expected tool calls were actually made |
latency |
Response within max_latency_ms budget |
hallucination |
Fabrication signals in output |
llm_judge |
GPT-4o scores against natural-language criteria |
Links
- Docs: docs.cortexops.ai
- Repo: github.com/ashishodu2023/cortexops
- Issues: GitHub Issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cortexops-0.2.0.tar.gz.
File metadata
- Download URL: cortexops-0.2.0.tar.gz
- Upload date:
- Size: 23.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee463d93e4bb432d3e6e805acf1ddb0bd12aaba422815d76e57ce5ff817d4402
|
|
| MD5 |
d4f00959bd5fc1c30942eee828643d70
|
|
| BLAKE2b-256 |
e720769d457ffef6227c357e6b8d6397f3015dc91e36d413dd61085e88f0fa6e
|
File details
Details for the file cortexops-0.2.0-py3-none-any.whl.
File metadata
- Download URL: cortexops-0.2.0-py3-none-any.whl
- Upload date:
- Size: 25.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
84649a6b4d358988fee59fb7ce5132763eb2fc354a49fc02ddec04c997908cb0
|
|
| MD5 |
d524dc9ded8585e1fe3e7af6fedfe358
|
|
| BLAKE2b-256 |
ed87ec69a5024e4fdd43668b3be2e21341de692269039b117f957e1428cdbe21
|