A library containing LLM benchmarking tools
Project description
Flow Benchmark Tools
Create and run LLM benchmarks.
Installation
Just the library:
pip install flow-benchmark-tools:1.1.0
Library + Example benchmarks (see below):
pip install "flow-benchmark-tools[examples]:1.1.0"
Usage
-
Create an agent by inheriting BenchmarkAgent and implementing the
run_benchmark_case
method. -
Create a Benchmark by compiling a list of BenchmarkCases. These can be read from a JSONL file.
-
Associate agent and benchmark in a BenchmarkRun.
-
Use a BenchmarkRunner to run your BenchmarkRun.
Running example benchmarks
Two end-to-end benchmark examples are provided in the examples folder: a LangChain RAG application and an OpenAI Assistant agent.
To run the LangChain RAG benchmark:
python src/examples/langchain_rag_agent.py
To run the OpenAI Assistant benchmark:
python src/examples/openai_assistant_agent.py
The benchmark cases are defined in data/rag_benchmark.jsonl.
The two examples follow the typical usage pattern of the library:
- define an agent by implementing the BenchmarkAgent interface and overriding the
run_benchmark_case
method (you can also override thebefore
andafter
methods, if needed), - create a set of benchmark cases, typically as a JSONL file such as data/rag_benchmark.jsonl,
- use a BenchmarkRunner to run the benchmark.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file flow_benchmark_tools-1.1.0.tar.gz
.
File metadata
- Download URL: flow_benchmark_tools-1.1.0.tar.gz
- Upload date:
- Size: 852.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e987d4f9de930632200cf274c8b0ee62d83496d40d2f0dad71ebffa4053986d5 |
|
MD5 | e75a565012f7e913a80d4c16a0f1d8d7 |
|
BLAKE2b-256 | 42677a21ee4a4a46373ba1973dfd7ae22cf52370327ac4ccff7a3c576239710c |
File details
Details for the file flow_benchmark_tools-1.1.0-py3-none-any.whl
.
File metadata
- Download URL: flow_benchmark_tools-1.1.0-py3-none-any.whl
- Upload date:
- Size: 23.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 26a13d1480ea2a35cf36d72fb2486faa2ceb31f37ba2ccdd97c32781be3675c3 |
|
MD5 | a30471dcae3cb69126db410cdc07e892 |
|
BLAKE2b-256 | b9791583303d9bbfde0f67e07ab62035eb48c5e27b7c6c8a7c13512ed11f97ab |