A decorator-first LLM evaluation library for testing AI agents
Project description
fasteval
A decorator-first LLM evaluation library for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest. Read the docs.
Features
- 50+ built-in metrics -- stack
@fe.correctness,@fe.relevance,@fe.hallucination, and more - pytest native -- run evaluations with
pytest, get familiar pass/fail output - LLM-as-judge + deterministic -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
- Custom criteria --
@fe.criteria("Is the response empathetic?")for any evaluation you can describe in plain English - Multi-modal -- evaluate vision, audio, and image generation models
- Conversation metrics -- context retention, topic drift, consistency for multi-turn agents
- RAG metrics -- faithfulness, contextual precision, contextual recall, answer correctness
- Tool trajectory -- verify agent tool calls, argument matching, call sequences
- Reusable metric stacks --
@fe.stack()to compose and reuse metric sets across tests - Human-in-the-loop --
@fe.human_review()for manual review alongside automated metrics - Data-driven testing --
@fe.csv("test_data.csv")to load test cases from CSV files - Pluggable providers -- OpenAI (default), Anthropic, or bring your own
LLMClient
How It Works
Quick Start
pip install fasteval-core
Set your LLM provider key:
export OPENAI_API_KEY=sk-your-key-here
Write your first evaluation test:
import fasteval as fe
@fe.correctness(threshold=0.8)
@fe.relevance(threshold=0.7)
def test_qa_agent():
response = my_agent("What is the capital of France?")
fe.score(response, expected_output="Paris", input="What is the capital of France?")
Run it:
pytest test_qa_agent.py -v
Installation
# pip
pip install fasteval-core
# uv
uv add fasteval-core
Optional Extras
# Anthropic provider
pip install fasteval-core[anthropic]
# Vision-language evaluation (GPT-4V, Claude Vision)
pip install fasteval-core[vision]
# Audio/speech evaluation (Whisper, ASR)
pip install fasteval-core[audio]
# Image generation evaluation (DALL-E, Stable Diffusion)
pip install fasteval-core[image-gen]
# All multi-modal features
pip install fasteval-core[multimodal]
Usage Examples
Deterministic Metrics
import fasteval as fe
@fe.contains()
def test_keyword_present():
fe.score("The answer is 42", expected_output="42")
@fe.rouge(threshold=0.6, rouge_type="rougeL")
def test_summary_quality():
fe.score(actual_output=summary, expected_output=reference)
Custom Criteria
@fe.criteria("Is the response empathetic and professional?")
def test_tone():
response = agent("I'm frustrated with this product!")
fe.score(response)
@fe.criteria(
"Does the response include a legal disclaimer?",
threshold=0.9,
)
def test_compliance():
response = agent("Can I break my lease?")
fe.score(response)
RAG Evaluation
@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
def test_rag_pipeline():
result = rag_pipeline("How does photosynthesis work?")
fe.score(
actual_output=result.answer,
context=result.retrieved_docs,
input="How does photosynthesis work?",
)
Tool Trajectory
@fe.tool_call_accuracy(threshold=0.9)
def test_agent_tools():
result = agent.run("Book a flight to Paris")
fe.score(
result.response,
tool_calls=result.tool_calls,
expected_tools=[
{"name": "search_flights", "args": {"destination": "Paris"}},
{"name": "book_flight"},
],
)
Multi-Turn Conversations
@fe.context_retention(threshold=0.8)
@fe.conversation([
{"query": "My name is Alice and I'm a vegetarian"},
{"query": "Suggest a restaurant for me"},
{"query": "What dietary restriction should they accommodate?"},
])
async def test_memory(query, expected, history):
response = await agent(query, history=history)
fe.score(response, input=query, history=history)
Metric Stacks
# Define a reusable metric stack
@fe.stack()
@fe.correctness(threshold=0.8, weight=2.0)
@fe.relevance(threshold=0.7, weight=1.0)
@fe.coherence(threshold=0.6, weight=1.0)
def quality_metrics():
pass
# Apply to multiple tests
@quality_metrics
def test_chatbot():
response = agent("Explain quantum computing")
fe.score(response, expected_output=reference_answer, input="Explain quantum computing")
@quality_metrics
def test_summarizer():
summary = summarize(long_article)
fe.score(summary, expected_output=reference_summary)
Plugins
| Plugin | Description | Install |
|---|---|---|
| fasteval-langfuse | Evaluate Langfuse production traces with fasteval metrics | pip install fasteval-langfuse |
| fasteval-langgraph | Test harness for LangGraph agents | pip install fasteval-langgraph |
| fasteval-observe | Runtime monitoring with async sampling | pip install fasteval-observe |
Local Development
# Install uv
brew install uv
# Create virtual environment and install all dependencies
uv sync --all-extras --group dev --group test
# Run the test suite
uv run tox
# Run tests with coverage
uv run pytest tests/ --cov=fasteval --cov-report=term -v
# Format code
uv run black .
uv run isort .
# Type checking
uv run mypy .
Documentation
Full documentation is available at fasteval.io.
- Getting Started -- installation and quickstart guide
- Why FastEval -- motivation and design philosophy
- Core Concepts -- decorators, metrics, scoring, data sources
- Concepts -- LLM-as-judge, scoring thresholds, evaluation strategies
- LLM Metrics -- correctness, relevance, hallucination, and more
- Deterministic Metrics -- ROUGE, exact match, regex, JSON schema
- RAG Metrics -- faithfulness, contextual precision/recall
- Tool Trajectory -- tool call accuracy, sequence, argument matching
- Conversation Metrics -- context retention, consistency, topic drift
- Multi-Modal -- vision, audio, image generation evaluation
- Human Review -- human-in-the-loop evaluation
- Cookbooks -- RAG pipelines, CI/CD setup, prompt regression, production monitoring
- Plugins -- Langfuse, LangGraph, Observe
- Advanced -- custom metrics, providers, output collectors, traces
- API Reference -- decorators, evaluator, models, score
Contributing
See CONTRIBUTING.md for development setup, coding standards, and how to submit pull requests.
License
Apache License 2.0 -- see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasteval_core-1.2.1.tar.gz.
File metadata
- Download URL: fasteval_core-1.2.1.tar.gz
- Upload date:
- Size: 87.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a1009fb31b06cddcf21730391212587910e033bfe500562306c8ee99f60ae7
|
|
| MD5 |
0cbe9294c8ba5028655c501e4eb575de
|
|
| BLAKE2b-256 |
5d043f64df9be862b505d25cbf412332038c97b40b19e9f895a9b4aa74ae46a2
|
Provenance
The following attestation bundles were made for fasteval_core-1.2.1.tar.gz:
Publisher:
release.yml on intuit/fasteval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fasteval_core-1.2.1.tar.gz -
Subject digest:
b9a1009fb31b06cddcf21730391212587910e033bfe500562306c8ee99f60ae7 - Sigstore transparency entry: 1146318004
- Sigstore integration time:
-
Permalink:
intuit/fasteval@a9d98faf1c2b85a419fc5bfdb309c657c06aca02 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/intuit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9d98faf1c2b85a419fc5bfdb309c657c06aca02 -
Trigger Event:
pull_request
-
Statement type:
File details
Details for the file fasteval_core-1.2.1-py3-none-any.whl.
File metadata
- Download URL: fasteval_core-1.2.1-py3-none-any.whl
- Upload date:
- Size: 108.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b832f6f83ce7b78711feb1626781d25e172c4f65adcf43e5c633862bf19d5549
|
|
| MD5 |
9716670dcbd5f6f72de73a362dbf216d
|
|
| BLAKE2b-256 |
43702fcf5fb78551e707bc730f64c52d537501ca0b8c549e96ee52eb7a92b12d
|
Provenance
The following attestation bundles were made for fasteval_core-1.2.1-py3-none-any.whl:
Publisher:
release.yml on intuit/fasteval
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fasteval_core-1.2.1-py3-none-any.whl -
Subject digest:
b832f6f83ce7b78711feb1626781d25e172c4f65adcf43e5c633862bf19d5549 - Sigstore transparency entry: 1146318058
- Sigstore integration time:
-
Permalink:
intuit/fasteval@a9d98faf1c2b85a419fc5bfdb309c657c06aca02 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/intuit
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a9d98faf1c2b85a419fc5bfdb309c657c06aca02 -
Trigger Event:
pull_request
-
Statement type: