Skip to main content

synthetic evals for agents

Project description

Evaluateur

Synthetic evaluation helper for LLM applications, built around the dimensions → tuples → queries flow described in Hamel Husain's FAQ.

Installation

The project is packaged as a normal Python library. With uv:

uv add evaluateur

Basic usage

Define a Pydantic model that represents the dimensions of your evaluation space, then use the Evaluator to generate options and queries:

from pydantic import BaseModel, Field

from evaluateur import Evaluator, QueryMode, TupleStrategy


class Query(BaseModel):
    payer: str = Field(..., description="insurance payer, like Cigna")
    age: str = Field(..., description="patient age category, like 'adult' or 'pediatric'")
    complexity: str = Field(
        ...,
        description="complexity of the query to account for the edge cases, like 'off-label', 'comorbidities', etc",
    )
    geography: str = Field(..., description="geography indicator, like a zip code, specific state or county")


evaluator = Evaluator(Query, context="Healthcare prior authorization")

# Step 1: generate options for each dimension using Instructor
options = evaluator.generate_options(
    instructions="Focus on common US payers and edge-case clinical scenarios.",
)

# Step 2: turn options into tuples and natural language queries
output = evaluator.generate_queries(
    options=options,
    mode=QueryMode.HYBRID,
    tuple_strategy=TupleStrategy.CROSS_PRODUCT,
    tuple_count=50,
)

for q in output.queries:
    print(q.source_tuple.values, "->", q.query)

The evaluator uses environment variables (for example OPENAI_API_KEY) and supports any provider that instructor supports. You can customise the provider and model via the LLMClient helper if needed.

If your input model already uses iterator fields (for example payer: list[str] = ["Cigna", "Aetna"]), those lists are treated as fixed options and are not modified by generate_options(). Scalar fields of any basic type (str, int, float, and so on) are turned into lists of options automatically.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evaluateur-0.1.0.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evaluateur-0.1.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file evaluateur-0.1.0.tar.gz.

File metadata

  • Download URL: evaluateur-0.1.0.tar.gz
  • Upload date:
  • Size: 11.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evaluateur-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6bbbf4cf94fa322376379713d48e43708c6bf7af64ebb538b25f9cb694f4fd9b
MD5 5029b4719d49a2eb1ab1efdaa283e706
BLAKE2b-256 67bc6ab7ecf06f62fce57b19913d24fdf7f7d5546613de74418b028439d85319

See more details on using hashes here.

File details

Details for the file evaluateur-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evaluateur-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for evaluateur-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19257563265c137c4a55026af4dc516c1c9eaf6b8f7fd40e98d3aaf941c5de20
MD5 e128132fdc38c41a07de0c850bb93f7d
BLAKE2b-256 968448c67e3cb3a1e211f47a3170fc5087547ea622b16f0a089a37229ffae649

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page