Persona-based testing framework for AI agents

These details have not been verified by PyPI

Project links

Project description

Anek

Persona-based testing framework for AI agents.

Agents are not APIs. A structured input/output test will pass for every user — but your agent will fail Maria Garcia when she writes "hola can you help me i need change my password" and succeed for the neutral English baseline. Anek finds those failures before your users do.

Anek simulates realistic user personas interacting with your agent via Gherkin .feature files and uses Claude as a judge to evaluate outcomes — no code required to write tests.

Feature: Password Reset Flow

  @all_personas
  Scenario: User initiates password reset
    When the user says "I can't log into my account"
    Then the response should contain "password"
    And  the response should not contain "SSO"

    When the user says "I forgot my password and need to reset it"
    Then the response should contain "email"
    And  the response time should be under 5000ms

    When the user says "I got the reset email, what do I do now?"
    Then the goal should be achieved with "Agent guided the user through password reset without jargon"

Run it:

$ anek run features/password_reset.feature --agent http://localhost:8080/chat

Feature: Password Reset Flow
  ──────────────────────────────────────────────────────────────

  Scenario: User initiates password reset  |  persona: maria_garcia  |  PASS
    Background
    Given  the agent is available at "http://localhost:8080/health"  ✓
    When   the user says "I can't log into my account"
           → persona: "hola no puedo entrar a mi cuenta ayuda"
           ← agent:   "Hi! I can help you get back into your account..."
    Then   the response should contain "password"              ✓
    And    the response should not contain "SSO"               ✓
    ...
    Then   the goal should be achieved with "..."
           ✓ goal_achieved (confidence=92%): Agent clearly guided...

  Scenario: User initiates password reset  |  persona: elderly_user  |  FAIL
    ...
    Then   the response should not contain "SSO"               ✗
           ✗ "SSO" unexpectedly present in response

  ─────────────────────────────────────────────────────────────
  Persona              Status    Turns  Failed Steps
  ─────────────────────────────────────────────────
  maria_garcia         ✓ PASS    3      —
  elderly_user         ✗ FAIL    3      "SSO" unexpectedly present
  gen_z_user           ✓ PASS    3      —
  indian_english       ✓ PASS    3      —
  neutral_baseline     ✓ PASS    3      —
  ─────────────────────────────────────────────────
  Passed: 4/5  Failed: 1/5

Installation

pip install anek

Requires Python 3.11+ and an Anthropic API key (for persona simulation and LLM-as-judge evaluation).

Quick Start

1. Set your API key

export ANTHROPIC_API_KEY=sk-ant-...

2. Start the demo testbed agent

The repo ships a free demo agent (TechCorp Support Bot) that uses Groq's free API — no credit card needed.

pip install "anek[testbed]"
export GROQ_API_KEY=gsk_...
python -m anek.testbed.agent
# → TechCorp Support Bot running at http://localhost:8080

3. Run a feature file

anek run features/password_reset.feature --agent http://localhost:8080/chat

Results are saved to ./anek-results/ as JSON.

Writing Tests

Tests are standard Gherkin .feature files. No step definitions needed — Anek has built-in handlers for all step patterns.

Feature file structure

Feature: Name of the feature being tested

  Background:
    Given the agent is available at "http://localhost:8080/health"

  @all_personas
  Scenario: Descriptive scenario name
    When the user says "base message in plain English"
    Then the response should contain "expected keyword"
    And  the response should not contain "forbidden word"
    And  the response time should be under 3000ms

    When the user says "follow-up message"
    Then the goal should be achieved with "description of success"
    And  the sentiment should be positive

Persona tags

Control which personas run a scenario using @ tags on the Scenario line:

Tag	Behaviour
`@all_personas`	Run against every persona in `/personas`
`@persona:maria_garcia`	Run against one specific persona
`@personas:elderly_user,gen_z_user`	Run against a named subset

Multiple @persona:X tags on one scenario each run separately.

The `When` step — how personas work

The message in a When step is a base intent in plain English. Anek calls Claude to rewrite it as the persona would naturally type it before sending to your agent. The feature file stays readable; the persona transformation happens at runtime.

When the user says "I can't log in"
# maria_garcia sees: "hola no puedo entrar ayuda!!"
# elderly_user sees: "Good afternoon. I am unable to access my account, I'm afraid."
# gen_z_user  sees: "cant log in?? pls 😭"

Built-in verifiers (Then steps)

Step pattern	What it checks
`the response should contain "text"`	Case-insensitive substring match
`the response should not contain "text"`	Absence check
`the response time should be under Nms`	Latency threshold
`the sentiment should be positive\|neutral\|negative`	Claude judges sentiment
`the goal should be achieved with "description"`	Claude judges full transcript against goal

Personas

Personas live as YAML files in the /personas directory. Five starter personas are included:

Name	Description
`neutral_baseline`	Standard American English — control group
`maria_garcia`	Native Spanish speaker, intermediate English, Spanglish
`elderly_user`	Formal, verbose, confused by technical terms
`gen_z_user`	Lowercase, terse, slang, emojis
`indian_english`	Formal Indian English, "kindly revert", "do the needful"

Persona YAML format

name: maria_garcia
description: Native Spanish speaker, mid-30s, uses Spanglish occasionally
language_background: spanish_native
english_proficiency: intermediate  # native | fluent | advanced | intermediate | basic
traits:
  - omits articles occasionally ("I need help with account")
  - mixes Spanish words naturally mid-sentence
  - minimal punctuation, mostly lowercase
sample_phrases:
  - "hola can you help me i need change my password"
  - "the app no work for me since yesterday"

Validate a persona file:

anek personas validate personas/maria_garcia.yaml

CLI Reference

# Run a feature file against all tagged personas
anek run features/password_reset.feature --agent http://localhost:8080/chat

# Run with specific personas only
anek run features/password_reset.feature \
  --agent http://localhost:8080/chat \
  --personas elderly_user,maria_garcia

# Custom response field extraction (if your agent returns {"data": {"text": "..."}})
anek run features/test.feature \
  --agent http://myagent.com/chat \
  --response-path data.text \
  --message-field query

# List available personas
anek personas list

# Validate a persona file
anek personas validate personas/my_persona.yaml

# Skip saving JSON results
anek run features/test.feature --agent http://... --no-save

Exit codes

0 — all scenarios passed
1 — one or more scenarios failed

Configuration

Copy anek.config.yaml.example to anek.config.yaml:

anthropic_api_key: ${ANTHROPIC_API_KEY}
default_agent_endpoint: http://localhost:8080/chat
response_path: reply          # JSONPath to extract agent reply
message_field: message        # JSON field for outgoing user message
personas_dir: ./personas
results_dir: ./anek-results

Environment variables in ${VAR} syntax are expanded automatically. The CLI --agent flag overrides default_agent_endpoint.

Testbed Agent

testbed/agent.py is a self-contained FastAPI agent for testing Anek itself. It plays the role of TechCorp Support Bot — a deliberately scoped agent that handles password reset, account inquiries, and billing, and refuses anything outside that scope.

pip install "anek[testbed]"
export GROQ_API_KEY=gsk_...   # free at console.groq.com
python testbed/agent.py

It uses Groq's free tier (llama-3.1-8b-instant) by default. Any OpenAI-compatible API works via LLM_BASE_URL:

LLM_BASE_URL=http://localhost:11434/v1 LLM_MODEL=llama3.1 python testbed/agent.py

Project Structure

anek/
├── anek/
│   ├── cli.py              # Click CLI entry point
│   ├── feature_parser.py   # Gherkin .feature file parser
│   ├── persona.py          # Persona model + loader
│   ├── llm.py              # Claude API wrapper (simulation + judge)
│   ├── simulator.py        # Test orchestration engine
│   ├── verifiers.py        # Built-in step verifiers
│   ├── reporter.py         # Rich CLI + JSON output
│   └── drivers/
│       ├── base.py         # AgentDriver protocol
│       └── http.py         # HTTP REST driver
├── personas/               # Starter persona YAML files
├── features/               # Example .feature files
├── testbed/                # Demo agent (TechCorp Support Bot)
├── tests/                  # Anek's own test suite
└── pyproject.toml

Contributing

git clone https://github.com/souratendu/anek
cd anek
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/

Pull requests welcome. If you build a new persona, verifier type, or driver, open a PR.

License

MIT — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anek-0.1.0.tar.gz (18.7 kB view details)

Uploaded May 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

anek-0.1.0-py3-none-any.whl (20.2 kB view details)

Uploaded May 8, 2026 Python 3

File details

Details for the file anek-0.1.0.tar.gz.

File metadata

Download URL: anek-0.1.0.tar.gz
Upload date: May 8, 2026
Size: 18.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for anek-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6074d0ac9a7953ce6ae66245f40faa3ac35981706d78ecc5a4cefbca269355d2`
MD5	`ab39445b1600433e71efa6bdfb26503d`
BLAKE2b-256	`77699b3c966eb5e4aab027f919c801c8d63589f23fdaeead921be65a957af2bc`

See more details on using hashes here.

File details

Details for the file anek-0.1.0-py3-none-any.whl.

File metadata

Download URL: anek-0.1.0-py3-none-any.whl
Upload date: May 8, 2026
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for anek-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`034143919bec4fc612fdf00fd7f81a104c380f14b168099d17e824d3a5da36a9`
MD5	`cd9ffbf4225b117bbfd7c8525671172f`
BLAKE2b-256	`d0b5763476350225778a075b1330b26a2cf039cd9e52da112b53a9b191ec5ef9`

See more details on using hashes here.

anek 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Anek

Installation

Quick Start

1. Set your API key

2. Start the demo testbed agent

3. Run a feature file

Writing Tests

Feature file structure

Persona tags

The When step — how personas work

Built-in verifiers (Then steps)

Personas

Persona YAML format

CLI Reference

Exit codes

Configuration

Testbed Agent

Project Structure

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

The `When` step — how personas work