Multi-backend evaluation framework for LLM and RAG systems
Project description
Floeval
Evaluation framework for LLM and RAG systems.
Overview
Floeval is a flexible evaluation framework designed to support multiple metric providers and execution backends.
Features
- Multi-backend metrics: RAGAS, DeepEval, and built-in metrics
- LLM and RAG evaluation: Evaluate responses, faithfulness, answer relevancy, and more
- Agent evaluation: Optional Flotorch integration for agent-based evaluation
- CLI and Python API: Run evaluations from config files or programmatically
Installation
Stable (production)
pip install floeval
Beta / Pre-release (for testing)
pip install --pre floeval
# Or specific version: pip install --pre floeval==0.1.0b1
Note: The --pre flag is required to install beta versions. Without it, pip installs only stable releases.
With optional Flotorch support (agent evaluation)
pip install floeval[flotorch]
Development
pip install -e .
pip install -e .[dev]
Structure
- api/ - Public API (Evaluation, Dataset, Sample, Metrics)
- core/execution/ - Execution engine (LLM calls, response synthesis)
- metric_providers/ - Metrics organized by provider (builtin, ragas, deepeval)
- config/schemas/ - Configuration schemas and data models
- cli/ - Command-line interface
- utils/ - Utility functions (loaders, gateways, etc.)
Quick Start
Python API
from floeval import Evaluation, DatasetLoader
from floeval.config.schemas.io.llm import OpenAIProviderConfig
llm_config = OpenAIProviderConfig(
base_url="https://api.openai.com/v1",
api_key="your-api-key",
chat_model="gpt-4o-mini",
embedding_model="text-embedding-3-small"
)
dataset = DatasetLoader.from_samples([
{"user_input": "What is RAG?", "llm_response": "RAG is Retrieval-Augmented Generation."}
])
evaluation = Evaluation(
dataset=dataset,
llm_config=llm_config,
metrics=["answer_relevancy", "faithfulness"]
)
results = evaluation.run()
print(results.aggregate_scores)
CLI
# Evaluate with full dataset
floeval evaluate -c config.yaml -d dataset.json -o results.json
# Or use partial dataset (generate + evaluate in one step)
floeval evaluate -c config.yaml -d partial_dataset.json -o results.json
# Or generate responses separately, then evaluate
floeval generate -c config.yaml -d partial_dataset.json -o complete.json
floeval evaluate -c config.yaml -d complete.json -o results.json
Documentation
Full documentation available in docs/:
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file floeval-0.1.0b1.tar.gz.
File metadata
- Download URL: floeval-0.1.0b1.tar.gz
- Upload date:
- Size: 74.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee632402dec6e25edda19f809dc468a518fdc821fe4dff41f572c2b0a66bce68
|
|
| MD5 |
18e6127fc5f7a0b05c7a1cbbb4330f1e
|
|
| BLAKE2b-256 |
d7c80b08ef69b5f5e64ccfeb3596da7d4ab47df9eb17a22070d590c26ad52d71
|
File details
Details for the file floeval-0.1.0b1-py3-none-any.whl.
File metadata
- Download URL: floeval-0.1.0b1-py3-none-any.whl
- Upload date:
- Size: 100.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
025b39a16184b81499a5b202d33c143da1ed65a4e648162464dbc8373939a47a
|
|
| MD5 |
0d1e8a13c129b87c756fd652d9596343
|
|
| BLAKE2b-256 |
2581e89d4011922cef62307d7fb1c34c419b2c1f36e46a3ab2b509ed0cd9b4b1
|