Reliability infrastructure for AI agents — evaluation, observability, and regression testing

These details have not been verified by PyPI

Project links

Project description

CortexOps

Reliability infrastructure for AI agents.
Evaluate · Observe · Operate — for LangGraph, CrewAI, and AutoGen.

The problem

You deployed an agent. You have no idea if it regressed overnight.

No standard eval format. No failure traces. No CI gate before the next prompt change ships.
CortexOps fixes that.

Install

pip install cortexops

# With HTTP client (for pushing traces to hosted API):
pip install cortexops[http]

# With LLM judge support:
pip install cortexops[llm]

Quickstart

from cortexops import CortexTracer, EvalSuite

# Wrap your LangGraph app — zero refactor required
tracer = CortexTracer(project="payments-agent")
graph  = tracer.wrap(your_langgraph_app)

# Run evaluations against a golden dataset
results = EvalSuite.run(
    dataset="golden_v1.yaml",
    agent=graph,
)
print(results.summary())

Golden dataset (YAML)

version: 1
project: payments-agent

cases:
  - id: refund_lookup_01
    input: "What is the status of refund REF-8821?"
    expected_tool_calls: [lookup_refund]
    expected_output_contains: ["approved", "REF-8821"]
    max_latency_ms: 3000

  - id: open_ended_explanation_01
    input: "Why was my refund rejected?"
    judge: llm
    judge_criteria: >
      The response must explain the rejection reason clearly,
      be empathetic, and offer a concrete next step. No jargon.

CI gate

cortexops eval run \
  --dataset golden_v1.yaml \
  --fail-on "task_completion < 0.90"

Exits non-zero if the threshold is not met — blocks the PR.

Built-in metrics

Metric	What it checks
`task_completion`	Non-empty, non-error output with expected content
`tool_accuracy`	Expected tool calls were actually made
`latency`	Response within `max_latency_ms` budget
`hallucination`	Fabrication signals in output
`llm_judge`	GPT-4o scores against natural-language criteria

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.0

Apr 18, 2026

This version

0.2.0

Apr 11, 2026

0.1.0

Apr 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cortexops-0.2.0.tar.gz (23.2 kB view details)

Uploaded Apr 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cortexops-0.2.0-py3-none-any.whl (25.4 kB view details)

Uploaded Apr 11, 2026 Python 3

File details

Details for the file cortexops-0.2.0.tar.gz.

File metadata

Download URL: cortexops-0.2.0.tar.gz
Upload date: Apr 11, 2026
Size: 23.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cortexops-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ee463d93e4bb432d3e6e805acf1ddb0bd12aaba422815d76e57ce5ff817d4402`
MD5	`d4f00959bd5fc1c30942eee828643d70`
BLAKE2b-256	`e720769d457ffef6227c357e6b8d6397f3015dc91e36d413dd61085e88f0fa6e`

See more details on using hashes here.

File details

Details for the file cortexops-0.2.0-py3-none-any.whl.

File metadata

Download URL: cortexops-0.2.0-py3-none-any.whl
Upload date: Apr 11, 2026
Size: 25.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for cortexops-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`84649a6b4d358988fee59fb7ce5132763eb2fc354a49fc02ddec04c997908cb0`
MD5	`d524dc9ded8585e1fe3e7af6fedfe358`
BLAKE2b-256	`ed87ec69a5024e4fdd43668b3be2e21341de692269039b117f957e1428cdbe21`

See more details on using hashes here.

cortexops 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

CortexOps

The problem

Install

Quickstart

Golden dataset (YAML)

CI gate

Built-in metrics

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes