No project description provided
Project description
AutoArena
AutoArena helps you stack rank LLM outputs against one another using automated judge evaluation.
Install from PyPI and run with:
pip install autoarena
python -m autoarena
Usage
Getting started with AutoArena is simple:
- Run AutoArena via
python -m autoarena
and visit localhost:8899 in your browser. - Create a project via the UI.
- Add responses from a model by selecting a CSV file with
prompt
andresponse
columns. - Configure an automated judge via the UI. Note that most judges require credentials, e.g.
X_API_KEY
in the environment where you're running AutoArena. - Add responses from a second model to kick off an automated judging task using the judges you configured in the
previous step to decide which of the models you've uploaded provided a better
response
to a givenprompt
.
That's it! After these steps you're fully set up for automated evaluation on AutoArena.
Data Storage
Data is stored in ./data/<project>.duckdb
files in the directory where you invoked AutoArena. See
data/README.md
for more details on data storage in AutoArena.
Development
AutoArena uses uv to manage dependencies. To set up this repository for development, run:
uv venv && source .venv/bin/activate
uv pip install --all-extras -r pyproject.toml
uv tool run pre-commit install
uv run python3 -m autoarena --dev
To run AutoArena for development, you will need to run both the backend and frontend service:
- Backend:
uv run python3 -m autoarena --dev
(the--dev
/-d
flag enables automatic service reloading when source files change) - Frontend: see
ui/README.md
To build a release tarball in the ./dist
directory:
./scripts/build.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.