Generate balanced AI eval fixtures from source examples, bugs, docs, and policies. Python port of @mukundakatta/eval-dataset-smith.
Project description
eval-dataset-smith-py
Generate balanced AI eval fixtures from your bugs, docs, examples, and policies. Zero runtime dependencies.
Python port of @mukundakatta/eval-dataset-smith. The JS sibling has the full design notes; this README sticks to the Python API.
Install
pip install eval-dataset-smith-py
Usage
from eval_dataset_smith import forge_dataset, stratified_split
sources = [
{"type": "bug", "id": "B-1", "input": "repro: click X", "expected": "no crash", "difficulty": "easy"},
{"type": "bug", "id": "B-2", "input": "repro: open file Y", "expected": "no crash", "difficulty": "med"},
{"type": "doc", "question": "how does foo work?", "answer": "see chapter 3", "difficulty": "easy"},
{"type": "policy", "input": "is PII allowed?", "expected": "redact", "difficulty": "hard"},
]
ds = forge_dataset(sources, balance_keys=["type", "difficulty"])
ds.cases # list[EvalCase] -- the eval fixtures
ds.balance # {"type": {...}, "difficulty": {...}} -- audit input skew
len(ds) # 4
# Per-tag stratified split (preserves type balance across train/test)
parts = stratified_split([c.__dict__ for c in ds.cases], ratio=0.8)
parts["train"], parts["test"]
API
forge_dataset(sources, balance_keys=("type","difficulty"), max_per_type=20) -> Dataset
Top-level Pythonic entry point. Returns a typed Dataset of EvalCase records plus a balance histogram you can use to audit input skew.
build_eval_dataset(items, max_per_type=20) -> list[dict]
Direct port of the JS buildEvalDataset. Accepts the JS field-name aliases:
| Field | Aliases |
|---|---|
input |
input / question / prompt |
expected |
expected / answer / acceptance |
type |
type (defaults to "general") |
tags |
tags: list[str] |
stratified_split(items, ratio=0.8) -> {"train": [...], "test": [...]}
Direct port of the JS stratifiedSplit. Splits by the first tag of each item, slicing each group at ceil(len(group) * ratio).
API differences from the JS sibling
forge_datasetis a Python addition that returns typed dataclasses (Dataset,EvalCase).build_eval_datasetandstratified_splitmirror the JS function names withsnake_case.
See the JS sibling's README for the full design notes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eval_dataset_smith_py-0.1.0.tar.gz.
File metadata
- Download URL: eval_dataset_smith_py-0.1.0.tar.gz
- Upload date:
- Size: 7.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a7829d92748846d9e817d806fbea65598afc778d3eceb6f4d9aaebee077a4a0
|
|
| MD5 |
ed758a7edcb5865c8a1ce8bc1df19fa5
|
|
| BLAKE2b-256 |
d4aab6bdc9c489f4f4a988294ad0cdd6ef0d84732cf75cdb3614f7c3fc105cf4
|
File details
Details for the file eval_dataset_smith_py-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eval_dataset_smith_py-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08e764e480f7ba256ac6f63ae9b7b3cd2e5368e62d5e262a4cd2a9e9519facf3
|
|
| MD5 |
8658f3ca7e96037ddb2ac9b2f941c783
|
|
| BLAKE2b-256 |
b0c4e4413d1a72c6e2c55c8d9a0f7ed8d661e013f8b03e5e9ccf5789f4a8f574
|