Skip to main content

Multi-backend evaluation framework for LLM and RAG systems

Project description

Floeval

Evaluation framework for LLM and RAG systems.

Overview

Floeval is a flexible evaluation framework designed to support multiple metric providers and execution backends.

Features

  • Multi-backend metrics: RAGAS, DeepEval, and built-in metrics
  • LLM and RAG evaluation: Evaluate responses, faithfulness, answer relevancy, and more
  • Agent evaluation: Optional Flotorch integration for agent-based evaluation
  • CLI and Python API: Run evaluations from config files or programmatically

Installation

Stable (production)

pip install floeval

Beta / Pre-release (for testing)

pip install --pre floeval
# Or specific version: pip install --pre floeval==0.1.0b1

Note: The --pre flag is required to install beta versions. Without it, pip installs only stable releases.

With optional Flotorch support (agent evaluation)

pip install floeval[flotorch]

Development

pip install -e .
pip install -e .[dev]

Structure

  • api/ - Public API (Evaluation, Dataset, Sample, Metrics)
  • core/execution/ - Execution engine (LLM calls, response synthesis)
  • metric_providers/ - Metrics organized by provider (builtin, ragas, deepeval)
  • config/schemas/ - Configuration schemas and data models
  • cli/ - Command-line interface
  • utils/ - Utility functions (loaders, gateways, etc.)

Quick Start

Python API

from floeval import Evaluation, DatasetLoader
from floeval.config.schemas.io.llm import OpenAIProviderConfig

llm_config = OpenAIProviderConfig(
    base_url="https://api.openai.com/v1",
    api_key="your-api-key",
    chat_model="gpt-4o-mini",
    embedding_model="text-embedding-3-small"
)

dataset = DatasetLoader.from_samples([
    {"user_input": "What is RAG?", "llm_response": "RAG is Retrieval-Augmented Generation."}
])

evaluation = Evaluation(
    dataset=dataset,
    llm_config=llm_config,
    metrics=["answer_relevancy", "faithfulness"]
)

results = evaluation.run()
print(results.aggregate_scores)

CLI

# Evaluate with full dataset
floeval evaluate -c config.yaml -d dataset.json -o results.json

# Or use partial dataset (generate + evaluate in one step)
floeval evaluate -c config.yaml -d partial_dataset.json -o results.json

# Or generate responses separately, then evaluate
floeval generate -c config.yaml -d partial_dataset.json -o complete.json
floeval evaluate -c config.yaml -d complete.json -o results.json

Documentation

Full documentation available in docs/:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

floeval-0.1.0b1.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

floeval-0.1.0b1-py3-none-any.whl (100.7 kB view details)

Uploaded Python 3

File details

Details for the file floeval-0.1.0b1.tar.gz.

File metadata

  • Download URL: floeval-0.1.0b1.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for floeval-0.1.0b1.tar.gz
Algorithm Hash digest
SHA256 ee632402dec6e25edda19f809dc468a518fdc821fe4dff41f572c2b0a66bce68
MD5 18e6127fc5f7a0b05c7a1cbbb4330f1e
BLAKE2b-256 d7c80b08ef69b5f5e64ccfeb3596da7d4ab47df9eb17a22070d590c26ad52d71

See more details on using hashes here.

File details

Details for the file floeval-0.1.0b1-py3-none-any.whl.

File metadata

  • Download URL: floeval-0.1.0b1-py3-none-any.whl
  • Upload date:
  • Size: 100.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for floeval-0.1.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 025b39a16184b81499a5b202d33c143da1ed65a4e648162464dbc8373939a47a
MD5 0d1e8a13c129b87c756fd652d9596343
BLAKE2b-256 2581e89d4011922cef62307d7fb1c34c419b2c1f36e46a3ab2b509ed0cd9b4b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page