Automated quality assurance for AI applications

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yiouli

These details have not been verified by PyPI

Project description

pixie-qa

Eval-driven development for Python LLM applications.

pixie-qa ships two complementary tools:

eval-driven-dev agent skill — guides a coding agent through the full eval-driven development loop: instrument → capture → build dataset → test → investigate → iterate.
pixie-qa Python package — the runtime: wrap() for data-boundary instrumentation, Runnable for dataset-driven test execution, built-in and custom evaluators, and the pixie CLI.

Agent Skill

Install

npx skills add yiouli/pixie-qa

Usage

Open a conversation with your coding agent and say something like:

"set up QA for my app"

The agent follows a six-step workflow:

Understand the app — entry point, execution flow, expected behaviors
Instrument with wrap() — mark data boundaries in the production code path
Define evaluators — map quality criteria to built-in or custom evaluators
Build a dataset — diverse representative scenarios in JSON
Run pixie test — real pass/fail scores for every scenario
Investigate & iterate — root-cause failures and fix

Python Package

Install

pip install pixie-qa
# with an LLM provider auto-instrumentor:
pip install "pixie-qa[openai]"   # openai | anthropic | langchain | google | dspy | all

`wrap()` — instrument data boundaries

Call wrap() at data boundaries in your application code. At test time, wrap(purpose="input") values are injected from the dataset; wrap(purpose="output") values are captured and scored by evaluators.

from pixie import wrap

db_result = wrap(fetch_from_db(user_id), purpose="input", name="db_result")
response   = wrap(generate_response(db_result), purpose="output", name="response")

Purpose	Meaning
`"input"`	External data fed into the LLM (injected at test time)
`"output"`	Final or intermediate output to evaluate
`"state"`	Intermediate state captured for debugging

`Runnable` — run the app against each dataset entry

Implement the Runnable protocol so pixie test and pixie trace know how to run your app:

from pydantic import BaseModel
import pixie

class MyArgs(BaseModel):
    user_id: str
    message: str

class MyAppRunnable(pixie.Runnable[MyArgs]):
    @classmethod
    def create(cls) -> "MyAppRunnable":
        return cls()

    async def setup(self) -> None:
        pass  # one-time initialization before entries run

    async def run(self, args: MyArgs) -> None:
        await my_app.handle(args.user_id, args.message)

    async def teardown(self) -> None:
        pass  # one-time cleanup after all entries finish

run() is called concurrently for all dataset entries — protect shared mutable state with asyncio.Semaphore or asyncio.Lock if needed.

Dataset JSON format

{
  "runnable": "pixie_qa/scripts/run_app.py:MyAppRunnable",
  "evaluators": ["Factuality"],
  "entries": [
    {
      "input_data": { "user_id": "u1", "message": "What is my balance?" },
      "test_case": {
        "eval_input": [
          {
            "purpose": "input",
            "name": "db_result",
            "data": { "balance": 120.5 }
          }
        ],
        "expectation": "Your current balance is $120.50.",
        "description": "basic balance query"
      }
    }
  ]
}

Use pixie trace + pixie format to capture real traces and turn them into dataset entries with the correct data shapes.

Evaluators

Evaluator	Task
`Factuality`	LLM-as-judge factual accuracy
`ClosedQA`	LLM-as-judge Q&A with reference answer
`AnswerCorrectness`	RAGAS combined factual + semantic similarity
`EmbeddingSimilarity`	Cosine similarity between output and expectation
`ExactMatch`	Deterministic exact string match
`create_llm_evaluator`	Custom prompt-based LLM-as-judge

Full evaluator list: docs/pixie/index.md

CLI reference

Command	Description
`pixie test [path]`	Run eval tests; open scorecard in browser
`pixie trace --runnable R --input I --output O`	Run a Runnable, capture trace to JSONL
`pixie format --input I --output O`	Convert a trace JSONL to a dataset entry JSON
`pixie init [root]`	Scaffold the `pixie_qa/` working directory
`pixie start [root]`	Launch the web UI at `http://localhost:7118`

Web UI

View all eval artifacts (results, datasets, markdown docs) in a live-updating local web UI:

pixie start              # initializes pixie_qa/ (if needed) and opens http://localhost:7118
pixie start my_dir       # use a custom artifact root
pixie init               # scaffolds pixie_qa/ without starting the server

Changes to artifacts are pushed to the browser in real time via SSE.

Configuration

Pixie reads configuration from environment variables and a local .env file. Existing process env vars take priority over .env values.

Variable	Description
`PIXIE_ROOT`	Root directory for all generated artefacts
`PIXIE_RATE_LIMIT_ENABLED`	`true` to enable evaluator throttling
`PIXIE_RATE_LIMIT_RPS`	Max requests per second for LLM-as-judge calls
`PIXIE_RATE_LIMIT_RPM`	Max requests per minute
`PIXIE_RATE_LIMIT_TPS`	Max tokens per second
`PIXIE_RATE_LIMIT_TPM`	Max tokens per minute

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

yiouli

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.8.6

Apr 22, 2026

0.8.5

Apr 22, 2026

0.8.4

Apr 21, 2026

0.8.3

Apr 20, 2026

0.8.2

Apr 20, 2026

0.8.1

Apr 17, 2026

0.8.0

Apr 14, 2026

0.7.3

Apr 14, 2026

0.7.2

Apr 13, 2026

This version

0.7.1

Apr 13, 2026

0.7.0

Apr 13, 2026

0.6.1

Apr 9, 2026

0.6.0

Apr 8, 2026

0.5.1

Apr 8, 2026

0.5.0

Apr 8, 2026

0.4.0

Apr 5, 2026

0.2.2

Mar 28, 2026

0.2.1

Mar 28, 2026

0.2.0

Mar 27, 2026

0.1.12

Mar 24, 2026

0.1.11

Mar 23, 2026

0.1.10

Mar 20, 2026

0.1.8

Mar 19, 2026

0.1.3

Mar 12, 2026

0.1.2

Mar 12, 2026

0.1.1

Mar 12, 2026

0.1.0

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixie_qa-0.7.1.tar.gz (603.3 kB view details)

Uploaded Apr 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pixie_qa-0.7.1-py3-none-any.whl (618.8 kB view details)

Uploaded Apr 13, 2026 Python 3

File details

Details for the file pixie_qa-0.7.1.tar.gz.

File metadata

Download URL: pixie_qa-0.7.1.tar.gz
Upload date: Apr 13, 2026
Size: 603.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pixie_qa-0.7.1.tar.gz
Algorithm	Hash digest
SHA256	`a5bb824c1a6a88cfd25f29a0d8e5987ad78b17c9a67cc2dff6519e6b84334a46`
MD5	`4e5f773e7db65cf9e283ac7c8f05bb1c`
BLAKE2b-256	`3045ca66540690f13eb188578fa477ceb46343517250161852e2047c6f4fdb69`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixie_qa-0.7.1.tar.gz:

Publisher: publish.yml on yiouli/pixie-qa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pixie_qa-0.7.1.tar.gz
- Subject digest: a5bb824c1a6a88cfd25f29a0d8e5987ad78b17c9a67cc2dff6519e6b84334a46
- Sigstore transparency entry: 1289683674
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: yiouli/pixie-qa@1836ad0b157ab26bfcbd2968f7828a5f73ee4a2b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/yiouli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1836ad0b157ab26bfcbd2968f7828a5f73ee4a2b
- Trigger Event: push

File details

Details for the file pixie_qa-0.7.1-py3-none-any.whl.

File metadata

Download URL: pixie_qa-0.7.1-py3-none-any.whl
Upload date: Apr 13, 2026
Size: 618.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pixie_qa-0.7.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`678e21f7f60f6e87ff67a63cb9e705ce1c90e1828cdfcb37522c503d46ecff9d`
MD5	`6cef5dec1abef584d6db24ae3085ef12`
BLAKE2b-256	`dbf3bc0d9e64ab76445b0d53f11f5b0d84e73278fa2251b0f785b38c4ccce828`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pixie_qa-0.7.1-py3-none-any.whl:

Publisher: publish.yml on yiouli/pixie-qa

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pixie_qa-0.7.1-py3-none-any.whl
- Subject digest: 678e21f7f60f6e87ff67a63cb9e705ce1c90e1828cdfcb37522c503d46ecff9d
- Sigstore transparency entry: 1289683795
- Sigstore integration time: Apr 13, 2026
Source repository:
- Permalink: yiouli/pixie-qa@1836ad0b157ab26bfcbd2968f7828a5f73ee4a2b
- Branch / Tag: refs/heads/main
- Owner: https://github.com/yiouli
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@1836ad0b157ab26bfcbd2968f7828a5f73ee4a2b
- Trigger Event: push

pixie-qa 0.7.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pixie-qa

Agent Skill

Install

Usage

Python Package

Install

wrap() — instrument data boundaries

Runnable — run the app against each dataset entry

Dataset JSON format

Evaluators

CLI reference

Web UI

Configuration

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`wrap()` — instrument data boundaries

`Runnable` — run the app against each dataset entry