A framework to build scenario simulation projects where human and LLM based agents can participant in, with a user-friendly web UI to visualize simulation, support automatically evaluation on agent action level.
Project description
Introduction
leaf-playground is a "definition driven development" framework to build scenario simulation projects that human and LLM-based agents can participant in together to compete to or co-operate with each other. It is primarily designed to efficiently evaluate the performance of LLM-based agents at the action level in specific scenarios or tasks, but it also possesses enormous potential for LLM native applications, such as developing a language-based game.
Apart from the framework itself, a bunch of CLI commands are provided to help developers speedup the process of building a scenario simulation project, and easily deploy a server with a WEB UI where users can create simulation tasks, manually and(or) automatically evaluate agents' performance, visualize the simulation process and evaluation results.
Below are sister projects of leaf-playground:
- leaf-playground-webui: the implementation of the leaf-playground's WEB UI.
- leaf-playground-hub: hosts our officially implemented scenario simulation projects.
Features
- "Definition Driven Development": advanced syntax for structured scenario definitions and programming conventions.
- Human + Multiple Agents: facilitates human and AI Agents interaction in designated scenarios.
- Auto Evaluation: automated action-level evaluation and report visualization for AI Agents.
- Local server support: one-click local service deployment for scenario simulation tasks management and execution.
- Containerization: containerization support for running scenario simulation tasks.
- Auto generate projects: auto-generate and auto-complete code for scenario simulation projects.
- Debug Friendly: support remote debugger across processes in Pycharm professional IDE.
Installation
Environment Setup
Make sure you have Python
and Node.js
installed on your computer, if not, you can set up the environment by following instructions:
- install
Python
: we recommend to use miniconda to configure Python virtual environment. - install
Node.js
: you can download and install Node.js from Node.js official site.
Quick Install
leaf-playground has already been upload to pypi, thus you can use pip
to quickly install:
pip install leaf-playground
If you want to save data in PostgreSQL instead of SQLite, you need to include the postgresql
extra dependency:
pip install leaf-playground[postgresql]
If you are a framework or scenario simulation project developer who want to debug the code, you need to include the debug
extra dependency:
pip install leaf-playground[debug]
Install from source
To install leaf-playground from the source, you need to clone the project by using git clone
, then in your local leaf-playground
directory, run:
pip install .
Usage
Start Server and Create a Task
To start the server that contains projects hosted in leaf-playground-hub, you need to first clone this project, then in the directory of your local leaf-playground-hub, using CLI command to start server with webui:
leaf-out start-server [--port PORT] [--ui_port UI_PORT]
By default, the backend service will run on port 8000, the UI service will run on port 3000, you can use --port
and --ui_port
options to use different ports respectively.
Below is a video demonstrates how to create and run a task that using MMLU dataset to evaluate LLM-based agents.
Maintainers
Roadmap
The Framework
- support human participant in the scenario simulation as a dynamic agent
- running each scenario simulation task in a docker container
- support manage task status(pause, restart, interrupt, etc.)
- support full task data persistence
- save task info, logs and message in database
- save task results in database or remote file system
- support for resuming runtime state and information from checkpoint and continuing execution
- support complete project automatically
- complete scene definition automatically
- complete agents automatically
- complete agent base classes automatically
- complete specific agent class automatically
- complete evaluator automatically
- complete scene automatically
- refactor
ai_backend
tollm_backend_tools
to remove some heavy dependencies - support streaming agents' responses
The Hub
- optimize scene flow of
who_is_the_spy
project and add metrics and evaluators - create a new project to support using OpenAI evals
- create a new project to support using Microsoft promptbench
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file leaf_playground-0.6.0.tar.gz
.
File metadata
- Download URL: leaf_playground-0.6.0.tar.gz
- Upload date:
- Size: 75.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2e779398a1f809f5f97dd5f9dadc721dae31166a9a1fb6190f4a45f69a826216 |
|
MD5 | dd10b28b44b2f6672b6cc5fbc351be01 |
|
BLAKE2b-256 | afb263dcf3d4add5d2c48655a939ce97d98d85b1e798cba88017e7e2e9294984 |