Persona-based testing framework for AI agents
Project description
Anek
Persona-based testing framework for AI agents.
Agents are not APIs. A structured input/output test will pass for every user — but your agent will fail Maria Garcia when she writes "hola can you help me i need change my password" and succeed for the neutral English baseline. Anek finds those failures before your users do.
Anek simulates realistic user personas interacting with your agent via Gherkin .feature files and uses Claude as a judge to evaluate outcomes — no code required to write tests.
Feature: Password Reset Flow
@all_personas
Scenario: User initiates password reset
When the user says "I can't log into my account"
Then the response should contain "password"
And the response should not contain "SSO"
When the user says "I forgot my password and need to reset it"
Then the response should contain "email"
And the response time should be under 5000ms
When the user says "I got the reset email, what do I do now?"
Then the goal should be achieved with "Agent guided the user through password reset without jargon"
Run it:
$ anek run features/password_reset.feature --agent http://localhost:8080/chat
Feature: Password Reset Flow
──────────────────────────────────────────────────────────────
Scenario: User initiates password reset | persona: maria_garcia | PASS
Background
Given the agent is available at "http://localhost:8080/health" ✓
When the user says "I can't log into my account"
→ persona: "hola no puedo entrar a mi cuenta ayuda"
← agent: "Hi! I can help you get back into your account..."
Then the response should contain "password" ✓
And the response should not contain "SSO" ✓
...
Then the goal should be achieved with "..."
✓ goal_achieved (confidence=92%): Agent clearly guided...
Scenario: User initiates password reset | persona: elderly_user | FAIL
...
Then the response should not contain "SSO" ✗
✗ "SSO" unexpectedly present in response
─────────────────────────────────────────────────────────────
Persona Status Turns Failed Steps
─────────────────────────────────────────────────
maria_garcia ✓ PASS 3 —
elderly_user ✗ FAIL 3 "SSO" unexpectedly present
gen_z_user ✓ PASS 3 —
indian_english ✓ PASS 3 —
neutral_baseline ✓ PASS 3 —
─────────────────────────────────────────────────
Passed: 4/5 Failed: 1/5
Installation
pip install anek
Requires Python 3.11+ and an Anthropic API key (for persona simulation and LLM-as-judge evaluation).
Quick Start
1. Set your API key
export ANTHROPIC_API_KEY=sk-ant-...
2. Start the demo testbed agent
The repo ships a free demo agent (TechCorp Support Bot) that uses Groq's free API — no credit card needed.
pip install "anek[testbed]"
export GROQ_API_KEY=gsk_...
python -m anek.testbed.agent
# → TechCorp Support Bot running at http://localhost:8080
3. Run a feature file
anek run features/password_reset.feature --agent http://localhost:8080/chat
Results are saved to ./anek-results/ as JSON.
Writing Tests
Tests are standard Gherkin .feature files. No step definitions needed — Anek has built-in handlers for all step patterns.
Feature file structure
Feature: Name of the feature being tested
Background:
Given the agent is available at "http://localhost:8080/health"
@all_personas
Scenario: Descriptive scenario name
When the user says "base message in plain English"
Then the response should contain "expected keyword"
And the response should not contain "forbidden word"
And the response time should be under 3000ms
When the user says "follow-up message"
Then the goal should be achieved with "description of success"
And the sentiment should be positive
Persona tags
Control which personas run a scenario using @ tags on the Scenario line:
| Tag | Behaviour |
|---|---|
@all_personas |
Run against every persona in /personas |
@persona:maria_garcia |
Run against one specific persona |
@personas:elderly_user,gen_z_user |
Run against a named subset |
Multiple @persona:X tags on one scenario each run separately.
The When step — how personas work
The message in a When step is a base intent in plain English. Anek calls Claude to rewrite it as the persona would naturally type it before sending to your agent. The feature file stays readable; the persona transformation happens at runtime.
When the user says "I can't log in"
# maria_garcia sees: "hola no puedo entrar ayuda!!"
# elderly_user sees: "Good afternoon. I am unable to access my account, I'm afraid."
# gen_z_user sees: "cant log in?? pls 😭"
Built-in verifiers (Then steps)
| Step pattern | What it checks |
|---|---|
the response should contain "text" |
Case-insensitive substring match |
the response should not contain "text" |
Absence check |
the response time should be under Nms |
Latency threshold |
the sentiment should be positive|neutral|negative |
Claude judges sentiment |
the goal should be achieved with "description" |
Claude judges full transcript against goal |
Personas
Personas live as YAML files in the /personas directory. Five starter personas are included:
| Name | Description |
|---|---|
neutral_baseline |
Standard American English — control group |
maria_garcia |
Native Spanish speaker, intermediate English, Spanglish |
elderly_user |
Formal, verbose, confused by technical terms |
gen_z_user |
Lowercase, terse, slang, emojis |
indian_english |
Formal Indian English, "kindly revert", "do the needful" |
Persona YAML format
name: maria_garcia
description: Native Spanish speaker, mid-30s, uses Spanglish occasionally
language_background: spanish_native
english_proficiency: intermediate # native | fluent | advanced | intermediate | basic
traits:
- omits articles occasionally ("I need help with account")
- mixes Spanish words naturally mid-sentence
- minimal punctuation, mostly lowercase
sample_phrases:
- "hola can you help me i need change my password"
- "the app no work for me since yesterday"
Validate a persona file:
anek personas validate personas/maria_garcia.yaml
CLI Reference
# Run a feature file against all tagged personas
anek run features/password_reset.feature --agent http://localhost:8080/chat
# Run with specific personas only
anek run features/password_reset.feature \
--agent http://localhost:8080/chat \
--personas elderly_user,maria_garcia
# Custom response field extraction (if your agent returns {"data": {"text": "..."}})
anek run features/test.feature \
--agent http://myagent.com/chat \
--response-path data.text \
--message-field query
# List available personas
anek personas list
# Validate a persona file
anek personas validate personas/my_persona.yaml
# Skip saving JSON results
anek run features/test.feature --agent http://... --no-save
Exit codes
0— all scenarios passed1— one or more scenarios failed
Configuration
Copy anek.config.yaml.example to anek.config.yaml:
anthropic_api_key: ${ANTHROPIC_API_KEY}
default_agent_endpoint: http://localhost:8080/chat
response_path: reply # JSONPath to extract agent reply
message_field: message # JSON field for outgoing user message
personas_dir: ./personas
results_dir: ./anek-results
Environment variables in ${VAR} syntax are expanded automatically. The CLI --agent flag overrides default_agent_endpoint.
Testbed Agent
testbed/agent.py is a self-contained FastAPI agent for testing Anek itself. It plays the role of TechCorp Support Bot — a deliberately scoped agent that handles password reset, account inquiries, and billing, and refuses anything outside that scope.
pip install "anek[testbed]"
export GROQ_API_KEY=gsk_... # free at console.groq.com
python testbed/agent.py
It uses Groq's free tier (llama-3.1-8b-instant) by default. Any OpenAI-compatible API works via LLM_BASE_URL:
LLM_BASE_URL=http://localhost:11434/v1 LLM_MODEL=llama3.1 python testbed/agent.py
Project Structure
anek/
├── anek/
│ ├── cli.py # Click CLI entry point
│ ├── feature_parser.py # Gherkin .feature file parser
│ ├── persona.py # Persona model + loader
│ ├── llm.py # Claude API wrapper (simulation + judge)
│ ├── simulator.py # Test orchestration engine
│ ├── verifiers.py # Built-in step verifiers
│ ├── reporter.py # Rich CLI + JSON output
│ └── drivers/
│ ├── base.py # AgentDriver protocol
│ └── http.py # HTTP REST driver
├── personas/ # Starter persona YAML files
├── features/ # Example .feature files
├── testbed/ # Demo agent (TechCorp Support Bot)
├── tests/ # Anek's own test suite
└── pyproject.toml
Contributing
git clone https://github.com/souratendu/anek
cd anek
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/
Pull requests welcome. If you build a new persona, verifier type, or driver, open a PR.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anek-0.1.0.tar.gz.
File metadata
- Download URL: anek-0.1.0.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6074d0ac9a7953ce6ae66245f40faa3ac35981706d78ecc5a4cefbca269355d2
|
|
| MD5 |
ab39445b1600433e71efa6bdfb26503d
|
|
| BLAKE2b-256 |
77699b3c966eb5e4aab027f919c801c8d63589f23fdaeead921be65a957af2bc
|
File details
Details for the file anek-0.1.0-py3-none-any.whl.
File metadata
- Download URL: anek-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
034143919bec4fc612fdf00fd7f81a104c380f14b168099d17e824d3a5da36a9
|
|
| MD5 |
cd9ffbf4225b117bbfd7c8525671172f
|
|
| BLAKE2b-256 |
d0b5763476350225778a075b1330b26a2cf039cd9e52da112b53a9b191ec5ef9
|