A library containing LLM benchmarking tools
Project description
Flow Benchmark Tools
Create and run LLM benchmarks.
Installation
Just the library:
pip install flow-benchmark-tools
Library + Example benchmarks (see below):
pip install flow-benchmark-tools[examples]
Usage
-
Create an agent by inheriting BenchmarkAgent and implementing the
run_benchmark_case
method. -
Create a Benchmark by compiling a list of BenchmarkCases. These can be read from a JSONL file.
-
Associate agent and benchmark in a BenchmarkRun.
-
Use a BenchmarkRunner to run your BenchmarkRun.
Running example benchmarks
Two end-to-end benchmark examples are provided in the examples folder: a LangChain RAG application and an OpenAI Assistant agent.
To run the LangChain RAG benchmark:
python src/examples/langchain_rag_agent.py
To run the OpenAI Assistant benchmark:
python src/examples/openai_assistant_agent.py
The benchmark cases are defined in data/rag_benchmark.jsonl.
The two examples follow the typical usage pattern of the library:
- define an agent by implementing the BenchmarkAgent interface and overriding the
run_benchmark_case
method (you can also override thebefore
andafter
methods, if needed), - create a set of benchmark cases, typically as a JSONL file such as data/rag_benchmark.jsonl,
- use a BenchmarkRunner to run the benchmark.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for flow_benchmark_tools-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 290ce4b355ae2cf7b76f7f0449b06672fe84a1d1bcdb4e0c7c2f2fea9f58cbe0 |
|
MD5 | ca42f045aa68b6d162b6d4d3ceffa201 |
|
BLAKE2b-256 | f266cbae0de4838f58e2a0e21a241fd288548c0f6be90827800036af77d4178a |
Hashes for flow_benchmark_tools-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86c3c248bdaca624523f96a690c087658e3fcd3776d32872ebef222df67600c7 |
|
MD5 | e258b406800d479f0d8a69b5381b220e |
|
BLAKE2b-256 | 1e18482a4688a7627145ad3c1be08ad30e4950ba2f90749c350a00e3e0148098 |