Skip to main content

A library containing LLM benchmarking tools

Project description

Flow Benchmark Tools

Create and run LLM benchmarks.

Installation

Just the library:

pip install flow-benchmark-tools:1.1.0

Library + Example benchmarks (see below):

pip install "flow-benchmark-tools[examples]:1.1.0"

Usage

Running example benchmarks

Two end-to-end benchmark examples are provided in the examples folder: a LangChain RAG application and an OpenAI Assistant agent.

To run the LangChain RAG benchmark:

python src/examples/langchain_rag_agent.py

To run the OpenAI Assistant benchmark:

python src/examples/openai_assistant_agent.py

The benchmark cases are defined in data/rag_benchmark.jsonl.

The two examples follow the typical usage pattern of the library:

  • define an agent by implementing the BenchmarkAgent interface and overriding the run_benchmark_case method (you can also override the before and after methods, if needed),
  • create a set of benchmark cases, typically as a JSONL file such as data/rag_benchmark.jsonl,
  • use a BenchmarkRunner to run the benchmark.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flow_benchmark_tools-1.1.0.tar.gz (852.7 kB view hashes)

Uploaded Source

Built Distribution

flow_benchmark_tools-1.1.0-py3-none-any.whl (23.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page