Skip to main content

A decorator-first LLM evaluation library for testing AI agents

Project description

fasteval

fasteval-core fasteval-langgraph fasteval-langfuse fasteval-observe Python versions CI License Docs

A decorator-first LLM evaluation library for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest. Read the docs.

The Evaluation Journey -- from non-deterministic LLM outputs to reliable engineering metrics

Features

  • 50+ built-in metrics -- stack @fe.correctness, @fe.relevance, @fe.hallucination, and more
  • pytest native -- run evaluations with pytest, get familiar pass/fail output
  • LLM-as-judge + deterministic -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
  • Custom criteria -- @fe.criteria("Is the response empathetic?") for any evaluation you can describe in plain English
  • Multi-modal -- evaluate vision, audio, and image generation models
  • Conversation metrics -- context retention, topic drift, consistency for multi-turn agents
  • RAG metrics -- faithfulness, contextual precision, contextual recall, answer correctness
  • Tool trajectory -- verify agent tool calls, argument matching, call sequences
  • Reusable metric stacks -- @fe.stack() to compose and reuse metric sets across tests
  • Human-in-the-loop -- @fe.human_review() for manual review alongside automated metrics
  • Data-driven testing -- @fe.csv("test_data.csv") to load test cases from CSV files
  • Pluggable providers -- OpenAI (default), Anthropic, or bring your own LLMClient

How It Works

How fasteval works -- Decorate, Test, Score, Evaluate, Result

Quick Start

pip install fasteval-core

Set your LLM provider key:

export OPENAI_API_KEY=sk-your-key-here

Write your first evaluation test:

import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.relevance(threshold=0.7)
def test_qa_agent():
    response = my_agent("What is the capital of France?")
    fe.score(response, expected_output="Paris", input="What is the capital of France?")

Run it:

pytest test_qa_agent.py -v

Installation

# pip
pip install fasteval-core

# uv
uv add fasteval-core

Optional Extras

# Anthropic provider
pip install fasteval-core[anthropic]

# Vision-language evaluation (GPT-4V, Claude Vision)
pip install fasteval-core[vision]

# Audio/speech evaluation (Whisper, ASR)
pip install fasteval-core[audio]

# Image generation evaluation (DALL-E, Stable Diffusion)
pip install fasteval-core[image-gen]

# All multi-modal features
pip install fasteval-core[multimodal]

Usage Examples

Deterministic Metrics

import fasteval as fe

@fe.contains()
def test_keyword_present():
    fe.score("The answer is 42", expected_output="42")

@fe.rouge(threshold=0.6, rouge_type="rougeL")
def test_summary_quality():
    fe.score(actual_output=summary, expected_output=reference)

Custom Criteria

@fe.criteria("Is the response empathetic and professional?")
def test_tone():
    response = agent("I'm frustrated with this product!")
    fe.score(response)

@fe.criteria(
    "Does the response include a legal disclaimer?",
    threshold=0.9,
)
def test_compliance():
    response = agent("Can I break my lease?")
    fe.score(response)

RAG Evaluation

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
def test_rag_pipeline():
    result = rag_pipeline("How does photosynthesis work?")
    fe.score(
        actual_output=result.answer,
        context=result.retrieved_docs,
        input="How does photosynthesis work?",
    )

Tool Trajectory

@fe.tool_call_accuracy(threshold=0.9)
def test_agent_tools():
    result = agent.run("Book a flight to Paris")
    fe.score(
        result.response,
        tool_calls=result.tool_calls,
        expected_tools=[
            {"name": "search_flights", "args": {"destination": "Paris"}},
            {"name": "book_flight"},
        ],
    )

Multi-Turn Conversations

@fe.context_retention(threshold=0.8)
@fe.conversation([
    {"query": "My name is Alice and I'm a vegetarian"},
    {"query": "Suggest a restaurant for me"},
    {"query": "What dietary restriction should they accommodate?"},
])
async def test_memory(query, expected, history):
    response = await agent(query, history=history)
    fe.score(response, input=query, history=history)

Metric Stacks

# Define a reusable metric stack
@fe.stack()
@fe.correctness(threshold=0.8, weight=2.0)
@fe.relevance(threshold=0.7, weight=1.0)
@fe.coherence(threshold=0.6, weight=1.0)
def quality_metrics():
    pass

# Apply to multiple tests
@quality_metrics
def test_chatbot():
    response = agent("Explain quantum computing")
    fe.score(response, expected_output=reference_answer, input="Explain quantum computing")

@quality_metrics
def test_summarizer():
    summary = summarize(long_article)
    fe.score(summary, expected_output=reference_summary)

Plugins

Plugin Description Install
fasteval-langfuse Evaluate Langfuse production traces with fasteval metrics pip install fasteval-langfuse
fasteval-langgraph Test harness for LangGraph agents pip install fasteval-langgraph
fasteval-observe Runtime monitoring with async sampling pip install fasteval-observe

Testing Pyramid for Agents -- layered testing strategy with fasteval-langgraph

Local Development

# Install uv
brew install uv

# Create virtual environment and install all dependencies
uv sync --all-extras --group dev --group test

# Run the test suite
uv run tox

# Run tests with coverage
uv run pytest tests/ --cov=fasteval --cov-report=term -v

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy .

Documentation

Full documentation is available at fasteval.io.

Contributing

See CONTRIBUTING.md for development setup, coding standards, and how to submit pull requests.

License

Apache License 2.0 -- see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_core-1.2.1.tar.gz (87.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_core-1.2.1-py3-none-any.whl (108.7 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_core-1.2.1.tar.gz.

File metadata

  • Download URL: fasteval_core-1.2.1.tar.gz
  • Upload date:
  • Size: 87.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fasteval_core-1.2.1.tar.gz
Algorithm Hash digest
SHA256 b9a1009fb31b06cddcf21730391212587910e033bfe500562306c8ee99f60ae7
MD5 0cbe9294c8ba5028655c501e4eb575de
BLAKE2b-256 5d043f64df9be862b505d25cbf412332038c97b40b19e9f895a9b4aa74ae46a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_core-1.2.1.tar.gz:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fasteval_core-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: fasteval_core-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 108.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fasteval_core-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b832f6f83ce7b78711feb1626781d25e172c4f65adcf43e5c633862bf19d5549
MD5 9716670dcbd5f6f72de73a362dbf216d
BLAKE2b-256 43702fcf5fb78551e707bc730f64c52d537501ca0b8c549e96ee52eb7a92b12d

See more details on using hashes here.

Provenance

The following attestation bundles were made for fasteval_core-1.2.1-py3-none-any.whl:

Publisher: release.yml on intuit/fasteval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page