Skip to main content

Continuous Integration for AI Agents. Catch cost spikes and logic regressions before production.

Project description

AgentCI

Pytest-native regression testing for AI agents. Catch routing changes, tool call drift, and cost spikes before production.

PyPI License: Apache-2.0 AGENTS.md

You changed a prompt. Your agent broke in production. Three days later, a user complained. You had no tests, no diff, no idea what went wrong.

Works with OpenAI, Anthropic, and LangGraph. Runs inside pytest.

Add to Your Project

pip install ciagent

Write your golden queries — what should your agent handle, and what should it refuse?

# agentci_spec.yaml
# runner: any function that takes a query string and returns a response
runner: my_app.agent:run_for_agentci
queries:
  - query: "How do I install AgentCI?"
    correctness:
      any_expected_in_answer: ["pip install", "ciagent"]
    path:
      expected_tools: [retrieve_docs]
    cost:
      max_llm_calls: 8

  - query: "What's the CEO's favorite restaurant?"
    correctness:
      not_in_answer: ["restaurant", "favorite"]
    path:
      expected_tools: []  # expect no tools called for out-of-scope queries

Run:

agentci test --mock       # start here: zero-cost with synthetic traces
agentci test              # run live against your real agent

agentci test evaluates each query through 3 layers — correctness, path, and cost:

============================================================

Query: How do I install AgentCI?
Answer: To install AgentCI, you can use pip with the following command:
        pip install ciagent. Make sure you have Python 3.10 or later.

  ✅  CORRECTNESS: PASS
       ✓ Found keywords: "pip install ciagent"
       ✓ LLM judge passed (score: 5 ≥ 0.6)
  📈  PATH: PASS
       ✓ Tool recall: 1.000 (expected: [retrieve_docs])
       ✓ Tool precision: 0.500
       ✓ No loops detected
  💰  COST: PASS
       ✓ LLM calls: 8 ≤ max 8

============================================================

Query: What Python version does AgentCI require and what frameworks does it support?
Answer: AgentCI currently does not specify a required Python version
        in the provided context, so I don't have that information...

  ❌  CORRECTNESS: FAIL
       • Expected '3.10' not found in answer
  📈  PATH: PASS
       ✓ Tool recall: 1.000 (expected: [retrieve_docs])
       ✓ Loops: 1 ≤ max 3
  💰  COST: PASS
       ✓ LLM calls: 4 ≤ max 5

============================================================

Don't have golden queries yet? agentci init --generate scans your code and generates a starter spec.

Demo

Here's a RAG agent demo where someone "optimizes for latency" by reducing retriever docs from 8 to 1. AgentCI catches the correctness regression:

AgentCI Demo

CLI

agentci init --generate        # Scan project, generate test spec
agentci init                   # Generate GitHub Actions workflow + pre-push hook
agentci test --mock --yes      # Zero-cost synthetic traces, CI-friendly (no keys, no prompts)
agentci test                   # Run 3-layer evaluation (correctness → path → cost)
agentci test --format html -o report.html  # HTML report with per-query details
agentci calibrate              # Measure real agent metrics, auto-tune spec budgets
agentci doctor                 # Health check: spec, deps, API keys
agentci record <test>          # Record golden baseline
agentci diff                   # Diff against baseline
agentci report -i results.json # Generate HTML report from JSON results

Contributing

GitHub Issues DemoAgents — working examples for all three frameworks

Apache 2.0. If you build an agent and test it with AgentCI, I'd love to hear about it.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ciagent-0.6.0.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ciagent-0.6.0-py3-none-any.whl (122.2 kB view details)

Uploaded Python 3

File details

Details for the file ciagent-0.6.0.tar.gz.

File metadata

  • Download URL: ciagent-0.6.0.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ciagent-0.6.0.tar.gz
Algorithm Hash digest
SHA256 077ba89b574cb22216a35742d95a977e65f71783d394abaa6798890df117bef1
MD5 0e75973c0f9e2da3e21db5ceaf0d5e9b
BLAKE2b-256 e93af4a038667156d43e1ec3f32f77e3fce29464a73d78dbe2c688f839b702e6

See more details on using hashes here.

File details

Details for the file ciagent-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: ciagent-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 122.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for ciagent-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 928f6feafc764cd16255a450b6a1ea1691599f0f06a8db188b3f6851cdf9fbf2
MD5 8781e7c2d79f3c1c36aa5970038b48f2
BLAKE2b-256 d97ee6ee8f7fd4c35f72bc3539e910ec95b805cc334ae4918dcf1a7fcd56b795

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page