No project description provided
Project description
AutoArena
AutoArena helps you stack rank LLM outputs against one another using automated judge evaluation.
Install from PyPI and run with:
pip install autoarena
python -m autoarena
Usage
Getting started with AutoArena is simple:
- Run AutoArena via
python -m autoarena
and visit localhost:8899 in your browser. - Create a project via the UI.
- Add responses from a model by selecting a CSV file with
prompt
andresponse
columns. - Configure an automated judge via the UI. Note that most judges require credentials, e.g.
X_API_KEY
in the environment where you're running AutoArena. - Add responses from a second model to kick off an automated judging task using the judges you configured in the
previous step to decide which of the models you've uploaded provided a better
response
to a givenprompt
.
That's it! After these steps you're fully set up for automated evaluation on AutoArena.
Data Storage
Data is stored in ./data/<project>.duckdb
files in the directory where you invoked AutoArena. See
data/README.md
for more details on data storage in AutoArena.
Development
AutoArena uses uv to manage dependencies. To set up this repository for development, run:
uv venv && source .venv/bin/activate
uv pip install --all-extras -r pyproject.toml
uv tool run pre-commit install
uv run python3 -m autoarena --dev
To run AutoArena for development, you will need to run both the backend and frontend service:
- Backend:
uv run python3 -m autoarena --dev
(the--dev
/-d
flag enables automatic service reloading when source files change) - Frontend: see
ui/README.md
To build a release tarball in the ./dist
directory:
./scripts/build.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file autoarena-0.1.0b5.tar.gz
.
File metadata
- Download URL: autoarena-0.1.0b5.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa2a7a58a1a73a5cf9b1826bc9dd6d7e7a6da63d7edcdd6d14acf97a787caad8 |
|
MD5 | 0d92b8cf004d938a78957279d0e89b7f |
|
BLAKE2b-256 | 7c4862e1b2b6bb530582f28946d814bf53111a1913edd1cb99a7b3102967e6a8 |