Automated quality assurance for AI applications
Project description
pixie-qa
Eval-driven development for Python LLM applications.
pixie-qa ships two complementary tools:
eval-driven-devagent skill — guides a coding agent through the full eval-driven development loop: instrument → capture → build dataset → test → investigate → iterate.pixie-qaPython package — the runtime:wrap()for data-boundary instrumentation,Runnablefor dataset-driven test execution, built-in and custom evaluators, and thepixieCLI.
Agent Skill
Install
npx skills add yiouli/pixie-qa
Usage
Open a conversation with your coding agent and say something like:
"set up QA for my app"
The agent follows a six-step workflow:
- Understand the app — entry point, execution flow, expected behaviors
- Instrument with
wrap()— mark data boundaries in the production code path - Define evaluators — map quality criteria to built-in or custom evaluators
- Build a dataset — diverse representative scenarios in JSON
- Run
pixie test— real pass/fail scores for every scenario - Investigate & iterate — root-cause failures and fix
Python Package
Install
pip install pixie-qa
# with an LLM provider auto-instrumentor:
pip install "pixie-qa[openai]" # openai | anthropic | langchain | google | dspy | all
wrap() — instrument data boundaries
Call wrap() at data boundaries in your application code. At test time, wrap(purpose="input") values are injected from the dataset; wrap(purpose="output") values are captured and scored by evaluators.
from pixie import wrap
db_result = wrap(fetch_from_db(user_id), purpose="input", name="db_result")
response = wrap(generate_response(db_result), purpose="output", name="response")
| Purpose | Meaning |
|---|---|
"input" |
External data fed into the LLM (injected at test time) |
"output" |
Final or intermediate output to evaluate |
"state" |
Intermediate state captured for debugging |
Runnable — run the app against each dataset entry
Implement the Runnable protocol so pixie test and pixie trace know how to run your app:
from pydantic import BaseModel
import pixie
class MyArgs(BaseModel):
user_id: str
message: str
class MyAppRunnable(pixie.Runnable[MyArgs]):
@classmethod
def create(cls) -> "MyAppRunnable":
return cls()
async def setup(self) -> None:
pass # one-time initialization before entries run
async def run(self, args: MyArgs) -> None:
await my_app.handle(args.user_id, args.message)
async def teardown(self) -> None:
pass # one-time cleanup after all entries finish
run() is called concurrently for all dataset entries — protect shared mutable state with asyncio.Semaphore or asyncio.Lock if needed.
Dataset JSON format
{
"runnable": "pixie_qa/scripts/run_app.py:MyAppRunnable",
"evaluators": ["Factuality"],
"entries": [
{
"entry_kwargs": { "user_id": "u1", "message": "What is my balance?" },
"test_case": {
"eval_input": [
{ "purpose": "input", "name": "db_result", "data": { "balance": 120.5 } }
],
"expectation": "Your current balance is $120.50.",
"description": "basic balance query"
}
}
]
}
Use pixie trace + pixie format to capture real traces and turn them into dataset entries with the correct data shapes.
Evaluators
| Evaluator | Task |
|---|---|
Factuality |
LLM-as-judge factual accuracy |
ClosedQA |
LLM-as-judge Q&A with reference answer |
AnswerCorrectness |
RAGAS combined factual + semantic similarity |
EmbeddingSimilarity |
Cosine similarity between output and expectation |
ExactMatch |
Deterministic exact string match |
create_llm_evaluator |
Custom prompt-based LLM-as-judge |
Full evaluator list: docs/pixie/index.md
CLI reference
| Command | Description |
|---|---|
pixie test [path] |
Run eval tests; open scorecard in browser |
pixie trace --runnable R --input I --output O |
Run a Runnable, capture trace to JSONL |
pixie format --input I --output O |
Convert a trace JSONL to a dataset entry JSON |
pixie analyze <test_run_id> |
LLM analysis of a completed test run |
pixie init [root] |
Scaffold the pixie_qa/ working directory |
pixie start [root] |
Launch the web UI at http://localhost:7118 |
Web UI
View all eval artifacts (results, datasets, markdown docs) in a live-updating local web UI:
pixie start # initializes pixie_qa/ (if needed) and opens http://localhost:7118
pixie start my_dir # use a custom artifact root
pixie init # scaffolds pixie_qa/ without starting the server
Changes to artifacts are pushed to the browser in real time via SSE.
Configuration
Pixie reads configuration from environment variables and a local .env file. Existing process env vars take priority over .env values.
| Variable | Description |
|---|---|
PIXIE_ROOT |
Root directory for all generated artefacts |
PIXIE_RATE_LIMIT_ENABLED |
true to enable evaluator throttling |
PIXIE_RATE_LIMIT_RPS |
Max requests per second for LLM-as-judge calls |
PIXIE_RATE_LIMIT_RPM |
Max requests per minute |
PIXIE_RATE_LIMIT_TPS |
Max tokens per second |
PIXIE_RATE_LIMIT_TPM |
Max tokens per minute |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pixie_qa-0.6.0.tar.gz.
File metadata
- Download URL: pixie_qa-0.6.0.tar.gz
- Upload date:
- Size: 326.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e042c68a87e4afbeb6b73298fac5b28faa58c5e24cdb17e48f0585161c9f001
|
|
| MD5 |
9a902fcf45383c06316325692ffd2f29
|
|
| BLAKE2b-256 |
c1657ee320c5df8f7ceef771139ce4c6a1e856ef5e7ca65bc3fb7fa13d23f701
|
Provenance
The following attestation bundles were made for pixie_qa-0.6.0.tar.gz:
Publisher:
publish.yml on yiouli/pixie-qa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pixie_qa-0.6.0.tar.gz -
Subject digest:
6e042c68a87e4afbeb6b73298fac5b28faa58c5e24cdb17e48f0585161c9f001 - Sigstore transparency entry: 1258191363
- Sigstore integration time:
-
Permalink:
yiouli/pixie-qa@7e9e5a79bc941b40237a19083c15bf63941e8c8c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/yiouli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7e9e5a79bc941b40237a19083c15bf63941e8c8c -
Trigger Event:
push
-
Statement type:
File details
Details for the file pixie_qa-0.6.0-py3-none-any.whl.
File metadata
- Download URL: pixie_qa-0.6.0-py3-none-any.whl
- Upload date:
- Size: 339.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a03c13e8e23e416a3a641714db47a5fc6b429f3e85cd43d283f94ce4d8a2d317
|
|
| MD5 |
01701d0d45eeaca08508ccd938d29d8a
|
|
| BLAKE2b-256 |
1cc55abd88f616fdf4c96b0a424871e6f5d4e7c0f7a10aabbb50836859b3a5cc
|
Provenance
The following attestation bundles were made for pixie_qa-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on yiouli/pixie-qa
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pixie_qa-0.6.0-py3-none-any.whl -
Subject digest:
a03c13e8e23e416a3a641714db47a5fc6b429f3e85cd43d283f94ce4d8a2d317 - Sigstore transparency entry: 1258191372
- Sigstore integration time:
-
Permalink:
yiouli/pixie-qa@7e9e5a79bc941b40237a19083c15bf63941e8c8c -
Branch / Tag:
refs/heads/main - Owner: https://github.com/yiouli
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@7e9e5a79bc941b40237a19083c15bf63941e8c8c -
Trigger Event:
push
-
Statement type: