⛵️ Know how your agent performs before it goes live.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arklex-ai

These details have not been verified by PyPI

Project links

Documentation

Project description

⛵️ ArkSim

Simulate multi-turn conversations with your AI agent. Find failures before production.

Documentation · Examples · Report a Bug

https://github.com/user-attachments/assets/78706f27-cf49-41c1-8019-9dcbb8abc625

What is ArkSim?

Agents fail in ways that only show up mid-conversation. They misinterpret intent three turns in, call the wrong tool, or hallucinate a policy that does not exist. Single-turn testing misses all of this.

ArkSim generates LLM-powered synthetic users that hold realistic multi-turn conversations with your agent. Each user has a distinct profile, goal, and knowledge level. They push back, ask follow-ups, and behave like real users would.

You define scenarios, ArkSim simulates conversations, then evaluates every turn across metrics like helpfulness, faithfulness, and goal completion. The output is an interactive report showing exactly where your agent broke and why.

ArkSim flow: Scenarios → Simulation → Evaluation → Reports

Quickstart

Have an agent? Test it in 3 commands:

pip install arksim
export OPENAI_API_KEY="your-key"
arksim init
# Edit my_agent.py with your agent logic, then run:
arksim simulate-evaluate config.yaml

This generates config.yaml, scenarios.json, and a starter my_agent.py.

For HTTP or A2A agents: arksim init --agent-type chat_completions or arksim init --agent-type a2a. For Anthropic or Google as the evaluation LLM: pip install "arksim[anthropic]" or pip install "arksim[google]".

Just exploring? Try an example:

pip install arksim
export OPENAI_API_KEY="your-key"
arksim examples
cd examples/e-commerce
arksim simulate-evaluate config.yaml

What you'll see

ArkSim evaluation report showing scores, failure categories, and conversation viewer

The report tells you where your agent is strong and where it breaks. You get per-metric scores, categorized failures, and full conversation transcripts so you can read the exact turns where things went wrong.

Test Your Own Agent

Python class (default)

arksim init generates a my_agent.py with a BaseAgent subclass. Replace the execute() body with your agent logic:

from arksim.simulation_engine.agent.base import BaseAgent
from arksim.simulation_engine.tool_types import AgentResponse

class MyAgent(BaseAgent):
    async def get_chat_id(self) -> str:
        return "unique-id"

    async def execute(self, user_query: str, **kwargs: object) -> str | AgentResponse:
        # Replace with your agent logic
        return "agent response"

Chat Completions endpoint

agent_config:
  agent_type: chat_completions
  agent_name: my-agent
  api_config:
    endpoint: http://localhost:8000/v1/chat/completions

A2A protocol

agent_config:
  agent_type: a2a
  agent_name: my-agent
  api_config:
    endpoint: http://localhost:9999/agent

A2A agents can also surface tool calls for evaluation via the arksim tool call capture extension. See examples/customer-service/a2a_server/ for a runnable reference server.

Write scenarios that match your agent's domain. See the Scenarios documentation for how to define goals, user profiles, and knowledge.

Why ArkSim?

Simulation, not just evaluation. Most tools score conversations you already have. ArkSim generates them with synthetic users who push back, ask follow-ups, and behave unpredictably.
Multi-turn by default. Every test is a full conversation, not a single prompt. Context loss, tool misuse, and contradictions only show up across turns.
Any agent, any framework. Works with 14+ frameworks through Chat Completions, A2A, or direct Python import.
Runs in CI. Add it as a quality gate on every PR. Exits non-zero when your agent drops below threshold.
Fully open source. Runs on your infrastructure. Your data never leaves.

Test in Claude Code

ArkSim ships with a native Claude Code skill pack and MCP server. Generate scenarios from your agent code, run simulations, and debug failures inline.

pip install "arksim[claude]"
arksim setup-claude          # writes .mcp.json and .claude/skills/arksim-*/

Restart Claude Code (or run /mcp to reload) so the new skills and MCP server load. Then ask Claude to run any of the skills, by name or by what you want to do:

Skill (auto-invoked by Claude when relevant)	What it does
`arksim-simulate`	Run multi-turn simulated conversations against your agent. Same flow as `arksim-test`, simulation-first name.
`arksim-test`	First time: guided setup. After: run simulation + evaluation. Same flow as `arksim-simulate`.
`arksim-evaluate`	Re-evaluate a previous run with different metrics or thresholds.
`arksim-scenarios`	Generate or edit scenarios from your agent's code.
`arksim-results`	Drill into failures turn by turn, compare runs.
`arksim-ui`	Open the web dashboard for browsing results.

arksim setup-claude --dry-run previews changes without touching the filesystem. --uninstall removes only what setup-claude installed (it reads from a manifest, so third-party arksim-* skills are not affected).

See the integration README for the install guide and the trust model.

Integrations

Framework	Provider
Claude Agent SDK	Anthropic
OpenAI Agents SDK	OpenAI
Google ADK	Google
LangChain	LangChain
LangGraph	LangChain
CrewAI	CrewAI
Dify	Dify
AutoGen	Microsoft
LlamaIndex	LlamaIndex
Pydantic AI	Pydantic
Rasa	Rasa
Smolagents	Hugging Face
Mastra	TypeScript
Vercel AI SDK	TypeScript

See examples for end-to-end projects with custom metrics and scenarios.

Learn More

Topic
Evaluation metrics (built-in and custom)	Metrics guide
CI integration (pytest and GitHub Actions)	CI setup guide
Configuration reference (all YAML settings)	Schema reference
Simulation and CLI usage	Simulation guide
Web UI for browsing results	Overview

Development

git clone https://github.com/arklexai/arksim.git
cd arksim
pip install -e ".[dev]"
pytest tests/

Linting and formatting:

ruff check .
ruff format .

See CONTRIBUTING.md for guidelines.

License

Apache-2.0. See LICENSE.

Citation

@misc{shea2026sage,
      title={SAGE: A Top-Down Bottom-Up Knowledge-Grounded User Simulator for Multi-turn AGent Evaluation},
      author={Ryan Shea and Yunan Lu and Liang Qiu and Zhou Yu},
      year={2026},
      eprint={2510.11997},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2510.11997},
}

Star History

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

arklex-ai

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.3.7

May 18, 2026

0.3.6

May 4, 2026

0.3.5

Apr 29, 2026

0.3.4

Apr 18, 2026

0.3.3

Mar 27, 2026

0.3.2

Mar 23, 2026

0.3.1

Mar 18, 2026

0.3.0

Mar 17, 2026

0.2.0

Mar 10, 2026

0.1.0

Mar 5, 2026

0.0.6

Mar 4, 2026

0.0.4

Mar 3, 2026

0.0.3

Mar 3, 2026

0.0.2

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arksim-0.3.7.tar.gz (2.5 MB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

arksim-0.3.7-py3-none-any.whl (214.9 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file arksim-0.3.7.tar.gz.

File metadata

Download URL: arksim-0.3.7.tar.gz
Upload date: May 18, 2026
Size: 2.5 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arksim-0.3.7.tar.gz
Algorithm	Hash digest
SHA256	`8652c16c76b363bb821991cf1735eb2bc199b912dad7e026eeaefb18a4c34f6a`
MD5	`3de9068dfff684febf1d5a45d914bde6`
BLAKE2b-256	`9509e46700a25602ef6131bf720a12185adaa3ba25e00d384b2c40585fb102c3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arksim-0.3.7.tar.gz:

Publisher: publish-pypi.yml on arklexai/arksim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arksim-0.3.7.tar.gz
- Subject digest: 8652c16c76b363bb821991cf1735eb2bc199b912dad7e026eeaefb18a4c34f6a
- Sigstore transparency entry: 1569070923
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: arklexai/arksim@321d8003879fa2d27b3377845efdc93848a8b6cc
- Branch / Tag: refs/tags/v0.3.7
- Owner: https://github.com/arklexai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@321d8003879fa2d27b3377845efdc93848a8b6cc
- Trigger Event: release

File details

Details for the file arksim-0.3.7-py3-none-any.whl.

File metadata

Download URL: arksim-0.3.7-py3-none-any.whl
Upload date: May 18, 2026
Size: 214.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for arksim-0.3.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f2ee54b16eaded8b75e90de934f4bd590cf5a8de77f8cc6f20fbcb4e4a2b055`
MD5	`01cfb9c8d587cf7e69edd110334660b0`
BLAKE2b-256	`a1d6d4a77a63ff6a5091cd0e8b3827c2adf73b40a6da32e232a6ac976e8be2c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for arksim-0.3.7-py3-none-any.whl:

Publisher: publish-pypi.yml on arklexai/arksim

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: arksim-0.3.7-py3-none-any.whl
- Subject digest: 1f2ee54b16eaded8b75e90de934f4bd590cf5a8de77f8cc6f20fbcb4e4a2b055
- Sigstore transparency entry: 1569070984
- Sigstore integration time: May 18, 2026
Source repository:
- Permalink: arklexai/arksim@321d8003879fa2d27b3377845efdc93848a8b6cc
- Branch / Tag: refs/tags/v0.3.7
- Owner: https://github.com/arklexai
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish-pypi.yml@321d8003879fa2d27b3377845efdc93848a8b6cc
- Trigger Event: release

arksim 0.3.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

⛵️ ArkSim

What is ArkSim?

Quickstart

Have an agent? Test it in 3 commands:

Just exploring? Try an example:

What you'll see

Test Your Own Agent

Python class (default)

Chat Completions endpoint

A2A protocol

Why ArkSim?

Test in Claude Code

Integrations

Learn More

Development

License

Citation

Star History

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance