A modular evaluation framework for testing functions with YAML-based specifications
Project description
VOWEL
YAML-based evaluation framework for testing Python functions with AI-powered test generation, function healing and TDD approach.
vowel makes it easy to define test cases in YAML and run them against your Python functions. It also provides AI-powered generators that can automatically create test specs, generate implementations, and fix buggy functions.
Installation
pip install vowel
# Or with uv
uv add vowel
Optional Dependencies
Vowel supports several optional dependency groups for enhanced functionality:
| Group | Install Command | Purpose / Extras |
|---|---|---|
| all | pip install vowel[all] |
All optional features |
| dev | pip install vowel[dev] |
Development & testing tools |
| mcp | pip install vowel[mcp] |
MCP server |
| optimization | pip install vowel[optimize] |
Performance optimizations |
| monty | pip install vowel[monty] |
Monty runtime support |
| logfire | pip install vowel[logfire] |
Logfire integration |
Tip:
You can install multiple extras at once, e.g.
pip install vowel[dev,mcp]Recommended:pip install vowel[all]
Development
git clone https://github.com/fswair/vowel.git
cd vowel
pip install -e ".[all]"
Quick Start
Note:
For a deeper understanding of how vowel handles fixtures, see the examples inexamples/db_fixtures. These example demonstrate the underlying mechanics of fixture setup and usage.
Tip:
To enable YAML schema validation in your editor, placevowel-schema.jsonin your project directory.
Then, add the following directive at the top of your YAML file to activate schema support and instructions:# yaml-language-server: $schema=<path/to/vowel-schema.json>Replace
<path/to/vowel-schema.json>with the actual path to your schema file.
1. Create a YAML spec
# evals.yml
add:
dataset:
- case:
inputs: { x: 2, y: 2 }
expected: 4
- case:
inputs: { x: -5, y: 5 }
expected: 0
divide:
evals:
Type:
type: "float"
dataset:
- case:
inputs: { a: 10, b: 2 }
expected: 5.0
- case:
inputs: { a: 1, b: 0 }
raises: ZeroDivisionError
2. Run from CLI
vowel evals.yml
3. Or programmatically
from vowel import run_evals
def add(x: int, y: int) -> int:
return x + y
def divide(a: float, b: float) -> float:
return a / b
summary = run_evals("evals.yml", functions={"add": add, "divide": divide})
print(f"All passed: {summary.all_passed}")
print(f"Coverage: {summary.coverage * 100:.1f}%")
4. Or use the fluent API
from vowel import RunEvals
summary = (
RunEvals.from_file("evals.yml")
.with_functions({"add": add, "divide": divide})
.filter(["add"])
.debug()
.run()
)
summary.print()
Name matching note: If your YAML uses
module.function, programmatic mappings can use either the exact key (module.function) or the short function name (function) in.with_functions(...).
Features
Evaluators
8 built-in evaluators for flexible testing:
| Evaluator | Purpose |
|---|---|
| Expected | Exact value matching |
| Type | Return type checking (strict/lenient) |
| Assertion | Custom Python expressions (output > 0, output == input * 2) |
| Duration | Performance constraints (function-level & case-level) |
| Pattern | Regex validation on output |
| ContainsInput | Verify output contains the input |
| Raises | Exception class + optional message matching |
| LLMJudge | AI-powered rubric evaluation |
factorial:
evals:
Assertion:
assertion: "output > 0"
Type:
type: "int"
Duration:
duration: 1.0
dataset:
- case: { input: 0, expected: 1 }
- case: { input: 5, expected: 120 }
Full reference: docs/EVALUATORS.md
Fixtures (Dependency Injection)
Inject databases, temp files, caches into functions under test. Three patterns: generator (yield), tuple (setup/teardown), simple (setup only).
fixtures:
db:
setup: myapp.setup_db
teardown: myapp.close_db
scope: module
query_user:
fixture: [db]
dataset:
- case:
inputs: { user_id: 1 }
expected: { name: "Alice" }
def query_user(user_id: int, *, db: dict) -> dict | None:
return db["users"].get(user_id)
Fixture scope aliases:
- Preferred scope names:
case,eval,file - Backward-compatible aliases:
function,module,session - Normalization mapping:
case -> function,eval -> module,file -> session
Example:
fixtures:
temp_data:
setup: myapp.make_temp_data
scope: case
db:
setup: myapp.setup_db
teardown: myapp.close_db
scope: eval
cache:
setup: myapp.setup_cache
scope: file
Full reference: docs/FIXTURES.md
Input Serializers
Transform YAML inputs into Pydantic models, dates, or custom types:
summary = (
RunEvals.from_file("evals.yml")
.with_functions({"get_user": get_user})
.with_serializer({"get_user": User}) # Schema mode
.run()
)
Serializer key matching: Serializer mappings follow the same rule as
.with_functions(...)— bothmodule.functionand shortfunctionkeys are accepted.
Assertion context and serializers: When a serializer is configured, assertion evaluators use the serialized value for
input(not raw YAML). This applies to schema mode,serial_fn, and nested/dict schemas.
Runnable example (YAML-native serializers + fixtures):
vowel examples/serializers/db_query_evals.yml
This example demonstrates:
- top-level
serializers:registry with bothschemaandserializerentries, - per-eval
serializer:references, - fixture class lifecycle wiring with
cls+teardown, - assertion checks that read serialized
inputvalues.
See:
examples/serializers/db_query_evals.ymlexamples/serializers/util.py
Full reference: docs/SERIALIZERS.md
AI-Powered Generation
EvalGenerator — test existing functions
from vowel import EvalGenerator, Function
generator = EvalGenerator(model="openai:gpt-4o", load_env=True)
func = Function.from_callable(my_function)
result = generator.generate_and_run(func, auto_retry=True, heal_function=True)
print(f"Coverage: {result.summary.coverage * 100:.1f}%")
TDDGenerator — generate everything from a description
from vowel.tdd import TDDGenerator
generator = TDDGenerator(model="gemini-3-flash-preview", load_env=True)
result = generator.generate_all(
description="Binary search for target in sorted list. Returns index or -1.",
name="binary_search"
)
result.print() # Shows: signature → tests → code → results
Step-by-step control:
description = "Calculate factorial of a non-negative integer"
signature = generator.generate_signature(description=description, name="factorial")
runner, yaml_spec = generator.generate_evals_from_signature(signature, description=description)
func = generator.generate_implementation(signature, yaml_spec, description=description)
summary = runner.with_functions({"factorial": func.impl}).run()
Full reference: docs/AI_GENERATION.md
MCP Server
Expose vowel's capabilities to AI assistants like Claude Desktop via Model Context Protocol.
Setup guide: docs/MCP.md
CLI
vowel evals.yml # Run single file
vowel -d ./tests # Run directory
vowel evals.yml -f add,divide # Filter functions
vowel evals.yml --ci --cov 90 # CI mode
vowel evals.yml --watch # Watch mode
vowel evals.yml --dry-run # Show plan without running
vowel evals.yml --export-json out.json # Export results
vowel evals.yml -v # Verbose summary
vowel evals.yml -v --hide-report # Verbose, hide pydantic_evals report
vowel schema examples/serializers/db_query_evals.yml # Validate + update schema header
vowel schema --create # Generate vowel-schema.json
vowel costs --list # List tracked generation/run costs
Full reference: docs/CLI.md
EvalSummary
summary = run_evals("evals.yml", functions={...})
summary.all_passed # bool
summary.success_count # int
summary.failed_count # int
summary.total_count # int
summary.coverage # float (0.0-1.0)
summary.failed_results # list[EvalResult]
summary.meets_coverage(0.9) # Check threshold
summary.print() # Rich formatted output
summary.to_json() # Export as dict
summary.xml() # Export as XML
Documentation
| Document | Description |
|---|---|
| YAML Spec | Complete YAML format reference |
| Evaluators | All 8 evaluator types |
| Fixtures | Dependency injection guide |
| Serializers | Input serializer patterns |
| AI Generation | EvalGenerator & TDDGenerator |
| CLI | Command-line reference |
| MCP Server | AI assistant integration |
| Troubleshooting | Common errors & solutions |
License
Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vowel-0.4.0.tar.gz.
File metadata
- Download URL: vowel-0.4.0.tar.gz
- Upload date:
- Size: 231.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d1dd3b0529d62f68064eebb0dd72d1b8e10c5d4eccfca34117b75130aa94ea8
|
|
| MD5 |
28ba726b8ec8f0012feb8eb721be6db9
|
|
| BLAKE2b-256 |
8a91d62e7619749aa01739a2e503217e7d0c03930ae0bc3d9b989025a3f8d35e
|
File details
Details for the file vowel-0.4.0-py3-none-any.whl.
File metadata
- Download URL: vowel-0.4.0-py3-none-any.whl
- Upload date:
- Size: 132.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
362e2bb42185693b3e21331c10e5b3782b1ab7fd05f50cab18267f2a71ac4d6e
|
|
| MD5 |
7e59794553badf4d1e9faf28b28b8d66
|
|
| BLAKE2b-256 |
5a6bd6ebed0ce36027e515d28cb3d4f1c0c3d5a0d07de852a418e681fd4db3a2
|