Continuous Integration for AI Agents. Catch cost spikes and logic regressions before production.
Project description
AgentCI
Pytest-native regression testing for AI agents. Catch routing changes, tool call drift, and cost spikes before production.
You changed a prompt. Your agent broke in production. Three days later, a user complained. You had no tests, no diff, no idea what went wrong.
Works with OpenAI, Anthropic, and LangGraph. Runs inside pytest.
Add to Your Project
pip install ciagent
Write your golden queries — what should your agent handle, and what should it refuse?
# agentci_spec.yaml
# runner: any function that takes a query string and returns a response
runner: my_app.agent:run_for_agentci
queries:
- query: "How do I install AgentCI?"
correctness:
any_expected_in_answer: ["pip install", "ciagent"]
path:
expected_tools: [retrieve_docs]
cost:
max_llm_calls: 8
- query: "What's the CEO's favorite restaurant?"
correctness:
not_in_answer: ["restaurant", "favorite"]
path:
expected_tools: [] # expect no tools called for out-of-scope queries
Run:
agentci test --mock # start here: zero-cost with synthetic traces
agentci test # run live against your real agent
agentci test evaluates each query through 3 layers — correctness, path, and cost:
============================================================
Query: How do I install AgentCI?
Answer: To install AgentCI, you can use pip with the following command:
pip install ciagent. Make sure you have Python 3.10 or later.
✅ CORRECTNESS: PASS
✓ Found keywords: "pip install ciagent"
✓ LLM judge passed (score: 5 ≥ 0.6)
📈 PATH: PASS
✓ Tool recall: 1.000 (expected: [retrieve_docs])
✓ Tool precision: 0.500
✓ No loops detected
💰 COST: PASS
✓ LLM calls: 8 ≤ max 8
============================================================
Query: What Python version does AgentCI require and what frameworks does it support?
Answer: AgentCI currently does not specify a required Python version
in the provided context, so I don't have that information...
❌ CORRECTNESS: FAIL
• Expected '3.10' not found in answer
📈 PATH: PASS
✓ Tool recall: 1.000 (expected: [retrieve_docs])
✓ Loops: 1 ≤ max 3
💰 COST: PASS
✓ LLM calls: 4 ≤ max 5
============================================================
Don't have golden queries yet? agentci init --generate scans your code and generates a starter spec.
Demo
Here's a RAG agent demo where someone "optimizes for latency" by reducing retriever docs from 8 to 1. AgentCI catches the correctness regression:
CLI
agentci init --generate # Scan project, generate test spec
agentci init # Generate GitHub Actions workflow + pre-push hook
agentci test --mock --yes # Zero-cost synthetic traces, CI-friendly (no keys, no prompts)
agentci test # Run 3-layer evaluation (correctness → path → cost)
agentci test --format html -o report.html # HTML report with per-query details
agentci calibrate # Measure real agent metrics, auto-tune spec budgets
agentci doctor # Health check: spec, deps, API keys
agentci record <test> # Record golden baseline
agentci diff # Diff against baseline
agentci report -i results.json # Generate HTML report from JSON results
Contributing
GitHub Issues DemoAgents — working examples for all three frameworks
Apache 2.0. If you build an agent and test it with AgentCI, I'd love to hear about it.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ciagent-0.6.0.tar.gz.
File metadata
- Download URL: ciagent-0.6.0.tar.gz
- Upload date:
- Size: 102.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
077ba89b574cb22216a35742d95a977e65f71783d394abaa6798890df117bef1
|
|
| MD5 |
0e75973c0f9e2da3e21db5ceaf0d5e9b
|
|
| BLAKE2b-256 |
e93af4a038667156d43e1ec3f32f77e3fce29464a73d78dbe2c688f839b702e6
|
File details
Details for the file ciagent-0.6.0-py3-none-any.whl.
File metadata
- Download URL: ciagent-0.6.0-py3-none-any.whl
- Upload date:
- Size: 122.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
928f6feafc764cd16255a450b6a1ea1691599f0f06a8db188b3f6851cdf9fbf2
|
|
| MD5 |
8781e7c2d79f3c1c36aa5970038b48f2
|
|
| BLAKE2b-256 |
d97ee6ee8f7fd4c35f72bc3539e910ec95b805cc334ae4918dcf1a7fcd56b795
|