Elasticdash AI test framework for Python
Project description
elasticdash_test
An AI-native test runner for ElasticDash workflow testing. Built for async AI pipelines — not a general-purpose test runner.
- Trace-first: every test receives a
ctx.traceto record and assert on LLM calls and tool invocations - Automatic interception for OpenAI, Gemini, and Grok via
httpx/requests— no manual instrumentation required - AI-specific matchers:
to_have_llm_step,to_call_tool,to_match_semantic_output,to_have_custom_step,to_have_prompt_where,to_evaluate_output_metric - Sequential execution, no parallelism overhead
- No pytest dependency
Installation
pip install elasticdash_test
Requires Python 3.10+.
Quick Start
1. Write a test file (my_flow.ai_test.py):
from elasticdash_test import ai_test, expect
@ai_test("checkout flow")
async def test_checkout(ctx):
await run_checkout(ctx)
expect(ctx.trace).to_have_llm_step(model="gpt-4o", contains="order confirmed")
expect(ctx.trace).to_call_tool("chargeCard")
2. Run it:
elasticdash test # discover all *.ai_test.py files
elasticdash test ./ai_tests # discover in a specific directory
elasticdash run my_flow.ai_test.py # run a single file
elasticdash dashboard # open workflows dashboard
3. Read the output:
✓ checkout flow (1.2s)
✗ refund flow (0.8s)
→ Expected tool "chargeCard" to be called, but no tool calls were recorded
2 passed
1 failed
Total: 3
Duration: 3.4s
Writing Tests
See the full guide in docs/test-writing-guidelines.md.
Decorators
Import from elasticdash_test and apply to functions — no global injection needed:
| Decorator | Description |
|---|---|
@ai_test(name) |
Register a test |
@before_all |
Run once before all tests in the file |
@before_each |
Run before every test in the file |
@after_each |
Run after every test in the file (runs even if the test fails) |
@after_all |
Run once after all tests in the file |
Test context
Each test function receives a ctx: AITestContext argument:
@ai_test("my test")
async def test_my_flow(ctx):
# ctx.trace — record and inspect LLM steps and tool calls
Recording trace data
Automatic interception (recommended): Call install_ai_interceptor() once in @before_all and the runner patches httpx/requests to record LLM steps for OpenAI, Gemini, and Grok calls automatically. See Automatic AI Interception below.
Manual recording: Use this for providers not covered by the interceptor, when testing against stubs/mocks, or to capture custom workflow steps:
ctx.trace.record_llm_step(
model="gpt-4o",
prompt="What is the order status?",
completion="The order has been confirmed.",
)
ctx.trace.record_tool_call(
name="chargeCard",
args={"amount": 99.99},
)
# Record custom workflow steps (RAG fetches, code/fixed steps, etc.)
ctx.trace.record_custom_step(
kind="rag", # 'rag' | 'code' | 'fixed' | 'custom'
name="pokemon-search",
tags=["sort:asc", "source:db"],
payload={"query": "pikachu attack"},
result={"ids": [25]},
metadata={"latency_ms": 120},
)
Matchers
to_have_llm_step(config?)
Assert the trace contains at least one LLM step matching the given config. All fields are optional and combined with AND logic.
expect(ctx.trace).to_have_llm_step(model="gpt-4o")
expect(ctx.trace).to_have_llm_step(contains="order confirmed") # searches prompt + completion
expect(ctx.trace).to_have_llm_step(prompt_contains="order status") # searches prompt only
expect(ctx.trace).to_have_llm_step(output_contains="order confirmed") # searches completion only
expect(ctx.trace).to_have_llm_step(provider="openai")
expect(ctx.trace).to_have_llm_step(provider="openai", prompt_contains="order status")
expect(ctx.trace).to_have_llm_step(prompt_contains="retry", times=3) # exactly 3 matching steps
expect(ctx.trace).to_have_llm_step(provider="openai", min_times=2) # at least 2 matching steps
expect(ctx.trace).to_have_llm_step(output_contains="error", max_times=1) # at most 1 matching step
| Field | Description |
|---|---|
model |
Exact model name match (e.g. 'gpt-4o') |
contains |
Substring match across prompt + completion (case-insensitive) |
prompt_contains |
Substring match in prompt only (case-insensitive) |
output_contains |
Substring match in completion only (case-insensitive) |
provider |
Provider name: 'openai', 'gemini', or 'grok' |
times |
Exact match count (fails unless exactly this many steps match) |
min_times |
Minimum match count (steps matching must be ≥ this value) |
max_times |
Maximum match count (steps matching must be ≤ this value) |
to_call_tool(tool_name)
Assert the trace contains a tool call with the given name.
expect(ctx.trace).to_call_tool("chargeCard")
to_match_semantic_output(expected, **options)
LLM-judged semantic match of combined LLM output vs. the expected string. Defaults to OpenAI GPT-4.1 with OPENAI_API_KEY.
# Minimal, using default OpenAI model
await expect(ctx.trace).to_match_semantic_output("order confirmed")
# Use a different provider
await expect(ctx.trace).to_match_semantic_output(
"attack stat",
provider="claude",
model="claude-3-opus-20240229",
)
# OpenAI-compatible endpoint (e.g., Moonshot/Kimi) via base_url + api_key
await expect(ctx.trace).to_match_semantic_output(
"order confirmed",
provider="openai",
model="kimi-k2-turbo-preview",
api_key=os.environ["KIMI_API_KEY"],
base_url="https://api.moonshot.ai/v1",
)
Environment keys by provider: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY (or GOOGLE_API_KEY), GROK_API_KEY.
to_evaluate_output_metric(config)
Evaluate one LLM step's prompt or result using an LLM and assert a numeric metric condition in the range 0.0–1.0. Defaults: target=result, condition=at_least 0.7, provider=openai, model=gpt-4.1.
# Evaluate the last LLM result; default condition at_least 0.7
await expect(ctx.trace).to_evaluate_output_metric(
evaluation_prompt="Rate how well this answers the user question.",
)
# Check a specific step (3rd LLM prompt), target the prompt text, require >= 0.8
await expect(ctx.trace).to_evaluate_output_metric(
evaluation_prompt="Score coherence of this prompt between 0 and 1.",
target="prompt",
nth=3,
condition={"at_least": 0.8},
provider="claude",
model="claude-3-opus-20240229",
)
# Custom comparator: score must be < 0.3
await expect(ctx.trace).to_evaluate_output_metric(
evaluation_prompt="Rate hallucination risk (0=none, 1=high).",
condition={"less_than": 0.3},
)
Options:
evaluation_prompt(required): your scoring instructions; model is asked to return only a number between 0 and 1.target:'result'(default) or'prompt'. Evaluates that text only.nth: pick which LLM step to score (1-based). Defaults to the last LLM step.condition: one ofgreater_than,less_than,at_least,at_most,equals; default is{"at_least": 0.7}. Fails if the score is outside 0.0–1.0 or cannot be parsed.provider/model/api_key/base_url: supports OpenAI, Claude, Gemini, Grok, and OpenAI-compatible endpoints viabase_url.
to_have_custom_step(config?)
Assert a recorded custom step (RAG/code/fixed/custom) matches filters.
expect(ctx.trace).to_have_custom_step(kind="rag", name="pokemon-search")
expect(ctx.trace).to_have_custom_step(tag="sort:asc")
expect(ctx.trace).to_have_custom_step(contains="pikachu")
expect(ctx.trace).to_have_custom_step(result_contains="25")
expect(ctx.trace).to_have_custom_step(kind="rag", min_times=1, max_times=2)
to_have_prompt_where(config)
Filter prompts, then assert additional constraints. Example: "all prompts containing A must also contain B".
# Prompts that contain "order" must also contain "confirmed"
expect(ctx.trace).to_have_prompt_where(filter_contains="order", require_contains="confirmed")
# Prompts containing "retry" must NOT contain "cancel"
expect(ctx.trace).to_have_prompt_where(filter_contains="retry", require_not_contains="cancel")
# Control counts on the filtered subset
expect(ctx.trace).to_have_prompt_where(
filter_contains="order",
require_contains="confirmed",
min_times=1,
max_times=3,
)
# Check a specific prompt position (1-based nth)
expect(ctx.trace).to_have_prompt_where(
filter_contains="order",
require_contains="confirmed",
nth=3, # the 3rd prompt among those containing "order"
)
Automatic AI Interception
Call install_ai_interceptor() in a @before_all hook and the runner patches httpx and requests before tests run, automatically recording LLM steps for:
| Provider | Endpoints intercepted |
|---|---|
| OpenAI | api.openai.com/v1/chat/completions, /v1/completions |
| Gemini | generativelanguage.googleapis.com/.../models/...:generateContent |
| Grok (xAI) | api.x.ai/v1/chat/completions |
Each intercepted call records model, provider, prompt, and completion into ctx.trace automatically. Your workflow code needs no changes.
from elasticdash_test import ai_test, before_all, after_all, install_ai_interceptor, uninstall_ai_interceptor, expect
@before_all
def setup():
install_ai_interceptor()
@after_all
def teardown():
uninstall_ai_interceptor()
@ai_test("user lookup flow")
async def test_user_lookup(ctx):
# This makes a real OpenAI call — intercepted automatically
await my_workflow.run("Find all active users")
# Works without any ctx.trace.record_llm_step() in your workflow
expect(ctx.trace).to_have_llm_step(prompt_contains="Find all active users")
expect(ctx.trace).to_have_llm_step(provider="openai")
Streaming: When stream=True is set on a request, the completion is recorded as "(streamed)" — the prompt and model are still captured.
Recording trace steps without passing ctx.trace (contextvars)
The runner sets a per-test current_trace using Python's contextvars, so your app code can record steps without threading ctx.trace through every function:
# In your test
from elasticdash_test import ai_test, set_current_trace, expect
@ai_test("flow test")
async def test_flow(ctx):
set_current_trace(ctx.trace) # bind the trace to the current async context
await run_flow_without_trace_arg() # your existing code
expect(ctx.trace).to_have_custom_step(kind="rag", name="pokemon-search")
# In your app/flow code (called during the test)
from elasticdash_test import get_current_trace
async def run_flow_without_trace_arg():
trace = get_current_trace()
if trace:
trace.record_custom_step(
kind="rag",
name="pokemon-search",
payload={"query": "pikachu attack"},
result={"ids": [25]},
tags=["source:db", "sort:asc"],
)
Configuration
Create an optional elasticdash.config.py at the project root:
config = {
"test_match": ["**/*.ai_test.py"],
"trace_mode": "local",
}
| Option | Default | Description |
|---|---|---|
test_match |
['**/*.ai_test.py'] |
Glob patterns for test discovery |
trace_mode |
'local' |
'local' (stub) or 'remote' (future ElasticDash backend) |
ed_agents.py, ed_workflows.py, ed_tools.py
These optional files are thin wrappers that bundle and re-export existing functions from your codebase. Load them automatically during test runs to provide agents, workflows, and tools to your test environment.
ed_agents.py
Re-export agent functions or create a config dict for easy reference:
# ed_agents.py — import from your app
from my_app.agents import checkout_agent, payment_agent
config = {
"checkout": checkout_agent,
"payment": payment_agent,
}
Access in tests:
@ai_test("checkout flow")
async def test_checkout(ctx, config):
agents = config.get("agents", {})
result = await agents["checkout"]("order-123")
ed_workflows.py
Re-export workflow functions from your application:
# ed_workflows.py
from my_app.workflows import order_workflow, refund_workflow
# Re-export directly — the runner will import this module
Access in tests:
@ai_test("full order workflow")
async def test_workflow(ctx):
from ed_workflows import order_workflow
result = await order_workflow("order-123", "cust-456")
expect(ctx.trace).to_call_tool("chargeCard")
ed_tools.py
Re-export tool functions that agents or workflows can invoke:
# ed_tools.py
from my_app.tools import charge_card, fetch_order_status, send_notification
Access in tests or workflows:
@ai_test("tool integration")
async def test_tools(ctx):
from ed_tools import fetch_order_status
status = await fetch_order_status("order-123")
expect(ctx.trace).to_have_custom_step(kind="external", name="fetch_order_status")
These files are loaded automatically if present in the project root.
Workflows Dashboard
Browse and search all available workflow functions in your project:
elasticdash dashboard # open dashboard at http://localhost:4573
elasticdash dashboard --port 4572 # use custom port
elasticdash dashboard --no-open # skip auto-opening browser
The dashboard scans ed_workflows.py and displays:
- Function names — all callable functions in the module
- Signatures — function parameters and return types
- Async indicator — marks async vs sync functions
- Source module — where the function is imported from (if not locally defined)
- File path — location of
ed_workflows.py
Use the search field to filter workflows by:
- Name — find workflow by function name (e.g.,
checkout_flow) - Source module — find all workflows from a specific module (e.g.,
app_workflows) - File path — filter by location in your codebase
This is useful for discovering available workflows, understanding their signatures, and identifying where functions are defined before calling them in tests.
Project Structure
elasticdash_test/
cli.py CLI entry point (click + glob)
runner.py Sequential test runner engine
reporter.py Color-coded terminal output
registry.py ai_test / before_all / after_all registry
trace.py TraceHandle, AITestContext, contextvars support
matchers.py Custom expect matchers
interceptors/
ai_interceptor.py Automatic httpx/requests interceptor for OpenAI / Gemini / Grok
Programmatic API
from elasticdash_test import install_ai_interceptor, uninstall_ai_interceptor
from elasticdash_test.runner import run_files
from elasticdash_test.reporter import print_results
install_ai_interceptor() # patch httpx/requests for automatic LLM tracing
results = await run_files(["./tests/flow.ai_test.py"])
print_results(results)
uninstall_ai_interceptor() # restore original transports when done
Non-Goals
This runner intentionally does not support:
- Parallel execution
- Watch mode
- Snapshot testing
- Coverage reporting
- pytest compatibility
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file elasticdash_sdk-0.1.2a2.tar.gz.
File metadata
- Download URL: elasticdash_sdk-0.1.2a2.tar.gz
- Upload date:
- Size: 89.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ec4a6c0862260cae87c169536a8c85786e8f3c585ec9bc693b288145a5d4d9d
|
|
| MD5 |
7cbe9bea6a36b4f60e65da022e02acd5
|
|
| BLAKE2b-256 |
31a6f6eace6df84719faae7fbaecf9d676c2f1db2617dc9c4c418238e00aa568
|
File details
Details for the file elasticdash_sdk-0.1.2a2-py3-none-any.whl.
File metadata
- Download URL: elasticdash_sdk-0.1.2a2-py3-none-any.whl
- Upload date:
- Size: 101.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d49506061015baca594b5cf4179698ad93bb824124ef3e1aff7bf489c80f0462
|
|
| MD5 |
45ddb10373cda168e457fe8ad5671bc9
|
|
| BLAKE2b-256 |
ae74c6a24aede52dc69977ec0d02ef16db23ce09e0e7de48f2e3451e77f673a4
|