Skip to main content

Simple, zero-dependency validation for AI agent outputs

Project description

EvalGuard

Simple, zero-dependency validation for AI agent outputs.

The Problem

AI agents produce unpredictable outputs. You need to validate them before using them, but existing tools are heavy:

  • LangSmith: Requires cloud, vendor lock-in
  • DeepEval: 50+ dependencies, expensive LLM-as-judge
  • RAGAS: RAG-only, not general purpose

The Solution

from evalguard import check, expect

@check(contains=["SELECT"], not_contains=["DROP", "DELETE"])
def sql_agent(query: str) -> str:
    return llm.complete(f"Write SQL for: {query}")

# Or inline validation
result = agent.run("Get all users")
expect(result).contains("SELECT").not_contains("DROP").valid_json()

Installation

pip install evalguard

Features

  • Zero dependencies - Only Python stdlib
  • Simple decorators - @check() for validation rules
  • Fluent API - expect(value).contains().matches().valid_json()
  • Deterministic checks - No LLM needed for basic validation
  • pytest compatible - Works with existing test infrastructure
  • Type hints - Full typing support

Usage

Fluent Validation with expect()

from evalguard import expect, ValidationError

result = agent.run("Generate SQL query")

# Chain multiple validations
expect(result).contains("SELECT").not_contains("DROP").max_length(1000)

# JSON validation
expect(response).valid_json()

# Regex matching
expect(date_str).matches(r"^\d{4}-\d{2}-\d{2}$")

# Custom predicates
expect(value).satisfies(lambda x: x > 0, "must be positive")

Decorator Validation with @check()

from evalguard import check

@check(
    contains=["SELECT", "FROM"],
    not_contains=["DROP", "DELETE", "TRUNCATE"],
    max_length=1000,
    not_empty=True,
)
def sql_agent(query: str) -> str:
    return llm.complete(query)

# Raises ValidationError if any check fails
result = sql_agent("Get all active users")

Custom Failure Handler

@check(contains=["required"], on_fail=lambda e: "fallback value")
def risky_agent(query: str) -> str:
    return llm.complete(query)

# Returns "fallback value" instead of raising on failure

Available Validations

Method Description
.contains(s) Value must contain substring
.not_contains(s) Value must not contain substring
.matches(pattern) Value must match regex
.not_matches(pattern) Value must not match regex
.valid_json() Value must be valid JSON
.max_length(n) Value length must be <= n
.min_length(n) Value length must be >= n
.not_empty() Value must not be empty/whitespace
.equals(v) Value must equal v
.is_type(t) Value must be instance of type
.satisfies(fn) Custom predicate must return True

API

expect(value)

Create a fluent expectation for chaining validations.

exp = expect(value)
exp.contains("x").not_contains("y")
exp.value  # Access the original value

@check(**rules)

Decorator to validate function return values.

@check(
    contains=["a", "b"],           # List of required substrings
    not_contains=["x", "y"],       # List of forbidden substrings
    matches=r"pattern",            # Regex that must match (or list)
    not_matches=r"pattern",        # Regex that must not match (or list)
    valid_json=True,               # Must be valid JSON
    max_length=100,                # Max string length
    min_length=10,                 # Min string length
    not_empty=True,                # Must not be empty
    satisfies=lambda x: x > 0,     # Custom predicate
    on_fail=handler,               # Optional failure handler
)
def my_function():
    ...

ValidationError

Raised when validation fails.

try:
    expect(value).contains("required")
except ValidationError as e:
    print(e.message)  # "Expected value to contain 'required'"
    print(e.value)    # The actual value
    print(e.rule)     # "contains"

Part of the Guard Suite

EvalGuard is part of a reliability suite for AI agents:

  • LoopGuard - Prevent infinite loops
  • EvalGuard - Validate outputs (this package)
  • FailGuard - Detect drift and silent failures (coming soon)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalguard-0.1.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

evalguard-0.1.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file evalguard-0.1.0.tar.gz.

File metadata

  • Download URL: evalguard-0.1.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for evalguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2eae330b3925619750f680e10337d90f62cd6bbaec533510ee82fe63a5cd9a24
MD5 6e07f10546364602e649a82da1325324
BLAKE2b-256 285eadb44dff240235fe032311f1255f16260e1c4a2cb13eb7d13dcc421df2b1

See more details on using hashes here.

File details

Details for the file evalguard-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: evalguard-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for evalguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66d3f93c0bf872b0ac546fb700497ea7a5b688ed2f5c87ebf0f1ac49948c7d83
MD5 4cec12e2834f14b4cf2552dcf3308913
BLAKE2b-256 20e35824384d689a2ae346f88a2ac8c8c07bde9333bdc23748da33513a57bf42

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page