Skip to main content

A modular evaluation framework for testing functions with YAML-based specifications

Project description

VOWEL

YAML-based evaluation framework for testing Python functions with AI-powered test generation and function healing.

vowel makes it easy to define test cases in YAML and run them against your Python functions. It also provides AI-powered generators that can automatically create test specs, generate implementations, and fix buggy functions.

Installation

pip install vowel

# Or with uv
uv add vowel

Development

git clone https://github.com/fswair/vowel.git
cd vowel
pip install -e .

Quick Start

1. Create a YAML spec

# evals.yml
add:
  dataset:
    - case:
        inputs: { x: 1, y: 2 }
        expected: 3
    - case:
        inputs: { x: -5, y: 5 }
        expected: 0

divide:
  evals:
    Type:
      type: "float"
  dataset:
    - case:
        inputs: { a: 10, b: 2 }
        expected: 5.0
    - case:
        inputs: { a: 1, b: 0 }
        raises: ZeroDivisionError

2. Run from CLI

vowel evals.yml

3. Or programmatically

from vowel import run_evals

def add(x: int, y: int) -> int:
    return x + y

def divide(a: float, b: float) -> float:
    return a / b

summary = run_evals("evals.yml", functions={"add": add, "divide": divide})
print(f"All passed: {summary.all_passed}")
print(f"Coverage: {summary.coverage * 100:.1f}%")

4. Or use the fluent API

from vowel import RunEvals

summary = (
    RunEvals.from_file("evals.yml")
    .with_functions({"add": add, "divide": divide})
    .filter(["add"])
    .debug()
    .run()
)

summary.print()

Features

Evaluators

8 built-in evaluators for flexible testing:

Evaluator Purpose
Expected Exact value matching
Type Return type checking (strict/lenient)
Assertion Custom Python expressions (output > 0, output == input * 2)
Duration Performance constraints (function-level & case-level)
Pattern Regex validation on output
ContainsInput Verify output contains the input
Raises Exception class + optional message matching
LLMJudge AI-powered rubric evaluation
factorial:
  evals:
    Assertion:
      assertion: "output > 0"
    Type:
      type: "int"
    Duration:
      duration: 1.0
  dataset:
    - case: { input: 0, expected: 1 }
    - case: { input: 5, expected: 120 }

Full reference: docs/EVALUATORS.md

Fixtures (Dependency Injection)

Inject databases, temp files, caches into functions under test. Three patterns: generator (yield), tuple (setup/teardown), simple (setup only).

fixtures:
  db:
    setup: myapp.setup_db
    teardown: myapp.close_db
    scope: module

query_user:
  fixture: [db]
  dataset:
    - case:
        inputs: { user_id: 1 }
        expected: { name: "Alice" }
def query_user(user_id: int, *, db: dict) -> dict | None:
    return db["users"].get(user_id)

Full reference: docs/FIXTURES.md

Input Serializers

Transform YAML inputs into Pydantic models, dates, or custom types:

summary = (
    RunEvals.from_file("evals.yml")
    .with_functions({"get_user": get_user})
    .with_serializer({"get_user": User})      # Schema mode
    .run()
)

Full reference: docs/SERIALIZERS.md

AI-Powered Generation

EvalGenerator — test existing functions

from vowel import EvalGenerator, Function

generator = EvalGenerator(model="openai:gpt-4o", load_env=True)
func = Function.from_callable(my_function)

result = generator.generate_and_run(func, auto_retry=True, heal_function=True)
print(f"Coverage: {result.summary.coverage * 100:.1f}%")

TDDGenerator — generate everything from a description

from vowel.tdd import TDDGenerator

generator = TDDGenerator(model="gemini-3-flash-preview", load_env=True)

result = generator.generate_all(
    description="Binary search for target in sorted list. Returns index or -1.",
    name="binary_search"
)

result.print()  # Shows: signature → tests → code → results

Step-by-step control:

signature = generator.generate_signature(description="...", name="factorial")
runner, yaml_spec = generator.generate_evals_from_signature(signature)
func = generator.generate_implementation(signature, yaml_spec)
summary = runner.with_functions({"factorial": func.impl}).run()

Full reference: docs/AI_GENERATION.md

MCP Server

Expose vowel's capabilities to AI assistants like Claude Desktop via Model Context Protocol.

Setup guide: docs/MCP.md


CLI

vowel evals.yml                          # Run single file
vowel -d ./tests                         # Run directory
vowel evals.yml -f add,divide            # Filter functions
vowel evals.yml --ci --coverage 90       # CI mode
vowel evals.yml --watch                  # Watch mode
vowel evals.yml --dry-run                # Show plan without running
vowel evals.yml --export-json out.json   # Export results

Full reference: docs/CLI.md


EvalSummary

summary = run_evals("evals.yml", functions={...})

summary.all_passed       # bool
summary.success_count    # int
summary.failed_count     # int
summary.total_count      # int
summary.coverage         # float (0.0-1.0)
summary.failed_results   # list[EvalResult]

summary.meets_coverage(0.9)    # Check threshold
summary.print()                # Rich formatted output
summary.to_json()              # Export as dict
summary.xml()                  # Export as XML

Documentation

Document Description
YAML Spec Complete YAML format reference
Evaluators All 8 evaluator types
Fixtures Dependency injection guide
Serializers Input serializer patterns
AI Generation EvalGenerator & TDDGenerator
CLI Command-line reference
MCP Server AI assistant integration
Troubleshooting Common errors & solutions

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vowel-0.3.1.tar.gz (148.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vowel-0.3.1-py3-none-any.whl (91.6 kB view details)

Uploaded Python 3

File details

Details for the file vowel-0.3.1.tar.gz.

File metadata

  • Download URL: vowel-0.3.1.tar.gz
  • Upload date:
  • Size: 148.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vowel-0.3.1.tar.gz
Algorithm Hash digest
SHA256 198bd8aef8768b2e58163f0943492497fd003badf242a2190ece95f4a43c0bdd
MD5 3f560254216b0d312c2fc9d5b53ae1f3
BLAKE2b-256 10882e5b39f7596a5e3ed7c0bd8e742ee6300b10677a2935fdc06815883ead29

See more details on using hashes here.

File details

Details for the file vowel-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: vowel-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 91.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for vowel-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 895121a388280d5864ec443e6ab08e1025b8380afcb2086a6b91ba51a35a99d0
MD5 359eb0a58ca9fcafacc5aa78c9fd4949
BLAKE2b-256 4bbf17429fadb4a842a7f2a7771093e8530025f23bb74e4f3a8370910b8f1b5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page