Skip to main content

A decorator-first LLM evaluation library for testing AI agents

Project description

fasteval

PyPI version Python versions CI License

A decorator-first LLM evaluation library for testing AI agents and LLMs. Stack decorators to define evaluation criteria, run with pytest.

Features

  • Decorator-based metrics -- stack @fe.correctness, @fe.relevance, @fe.hallucination, and 30+ more
  • pytest native -- run evaluations with pytest, get familiar pass/fail output
  • LLM-as-judge + deterministic -- semantic LLM metrics alongside ROUGE, exact match, JSON schema, regex
  • Multi-modal -- evaluate vision, audio, and image generation models
  • Conversation metrics -- context retention, topic drift, consistency for multi-turn agents
  • RAG metrics -- faithfulness, contextual precision, contextual recall, answer correctness
  • Tool trajectory -- verify agent tool calls, argument matching, call sequences
  • Pluggable providers -- OpenAI (default), Anthropic, Azure OpenAI, Ollama

Quick Start

pip install fasteval-core

Set your LLM provider key:

export OPENAI_API_KEY=sk-your-key-here

Write your first evaluation test:

import fasteval as fe

@fe.correctness(threshold=0.8)
@fe.relevance(threshold=0.7)
def test_qa_agent():
    response = my_agent("What is the capital of France?")
    fe.score(response, expected_output="Paris", input="What is the capital of France?")

Run it:

pytest test_qa_agent.py -v

Installation

# pip
pip install fasteval-core

# uv
uv add fasteval-core

Optional Extras

# Anthropic provider
pip install fasteval-core[anthropic]

# Vision-language evaluation (GPT-4V, Claude Vision)
pip install fasteval-core[vision]

# Audio/speech evaluation (Whisper, ASR)
pip install fasteval-core[audio]

# Image generation evaluation (DALL-E, Stable Diffusion)
pip install fasteval-core[image-gen]

# All multi-modal features
pip install fasteval-core[multimodal]

Usage Examples

Deterministic Metrics

import fasteval as fe

@fe.contains()
def test_keyword_present():
    fe.score("The answer is 42", expected_output="42")

@fe.rouge(threshold=0.6, rouge_type="rougeL")
def test_summary_quality():
    fe.score(actual_output=summary, expected_output=reference)

RAG Evaluation

@fe.faithfulness(threshold=0.8)
@fe.contextual_precision(threshold=0.7)
def test_rag_pipeline():
    result = rag_pipeline("How does photosynthesis work?")
    fe.score(
        actual_output=result.answer,
        context=result.retrieved_docs,
        input="How does photosynthesis work?",
    )

Tool Trajectory

@fe.tool_call_accuracy(threshold=0.9)
def test_agent_tools():
    result = agent.run("Book a flight to Paris")
    fe.score(
        actual_tools=result.tool_calls,
        expected_tools=[
            {"name": "search_flights", "args": {"destination": "Paris"}},
            {"name": "book_flight"},
        ],
    )

Metric Stacks

@fe.correctness(threshold=0.8, weight=2.0)
@fe.relevance(threshold=0.7, weight=1.0)
@fe.coherence(threshold=0.6, weight=1.0)
def test_comprehensive():
    response = agent("Explain quantum computing")
    fe.score(response, expected_output=reference_answer, input="Explain quantum computing")

Plugins

Plugin Description Install
fasteval-langfuse Evaluate Langfuse production traces with fasteval metrics pip install fasteval-langfuse
fasteval-langgraph Test harness for LangGraph agents pip install fasteval-langgraph
fasteval-observe Runtime monitoring with async sampling pip install fasteval-observe

Local Development

# Install uv
brew install uv

# Create virtual environment and install dependencies
uv sync --all-extras

# Run the test suite
uv run tox

# Format code
uv run black .
uv run isort .

# Type checking
uv run mypy .

Documentation

Full documentation is available in the docs/ directory, covering:

Contributing

See CONTRIBUTING.md for development setup, coding standards, and how to submit pull requests.

License

Apache License 2.0 -- see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasteval_core-1.0.0a1.tar.gz (78.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasteval_core-1.0.0a1-py3-none-any.whl (95.8 kB view details)

Uploaded Python 3

File details

Details for the file fasteval_core-1.0.0a1.tar.gz.

File metadata

  • Download URL: fasteval_core-1.0.0a1.tar.gz
  • Upload date:
  • Size: 78.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_core-1.0.0a1.tar.gz
Algorithm Hash digest
SHA256 dd2f84ca3f3f2e1b39c109086dcb5b740ba3abee851bb4c9adca4a5b269230c0
MD5 0d08415d9a0b509bd9e663b6c12b665e
BLAKE2b-256 097829c4e9b1a74b4cfdd07407d06dcf7d62037df39ac566a82a00ec91bfb6dd

See more details on using hashes here.

File details

Details for the file fasteval_core-1.0.0a1-py3-none-any.whl.

File metadata

  • Download URL: fasteval_core-1.0.0a1-py3-none-any.whl
  • Upload date:
  • Size: 95.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for fasteval_core-1.0.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 8dc0c5901dc7a1df65ce4c7766c10e0de404bf53ce1100fb31c93bf5c9e0d98a
MD5 69b6d713b336f34597b099120cbc304f
BLAKE2b-256 bf2506b36cebd8192a15385fac77a0b832dbb2a21ad0d92ccf4d1f5212f7df30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page