A lightweight tool for generating annotated eval datasets and running LLM-as-judge evaluations
Project description
simboba
( )
.-~~~-.
/ \
| === |
| ::::: |
|_:::::_|
'---'
Lightweight eval tracking with LLM-as-judge. Run evals as Python scripts, track results in a web UI.
Installation
pip install simboba
Quick Start
boba init # Create boba-evals/ folder with templates
boba magic # Print AI prompt to help configure your evals
boba run # Run your evals (handles Docker automatically)
boba serve # View results at http://localhost:8787
Commands
| Command | Description |
|---|---|
boba init |
Create boba-evals/ folder with starter templates |
boba magic |
Print detailed AI prompt to configure your eval scripts |
boba setup |
Print basic setup instructions |
boba run [script] |
Run eval script (default: test_chat.py). Handles Docker automatically |
boba serve |
Start web UI to view results |
boba datasets |
List all datasets |
boba generate "description" |
Generate a dataset from a description |
boba reset |
Delete database |
Writing Evals
Evals are Python scripts. Edit boba-evals/test_chat.py:
from simboba import Boba
from setup import get_context, cleanup
boba = Boba()
def agent(message: str) -> str:
"""Call your agent and return its response."""
ctx = get_context()
response = requests.post(
"http://localhost:8000/api/chat",
json={"user_id": ctx["user_id"], "message": message},
)
return response.json()["response"]
if __name__ == "__main__":
try:
# Option 1: Single eval
boba.eval(
input="Hello",
output=agent("Hello"),
expected="Should greet the user",
)
# Option 2: Run against a dataset
# boba.run(agent, dataset="my-dataset")
print("Done! Run 'boba serve' to view results.")
finally:
cleanup()
Creating Datasets
Via CLI
boba generate "A customer support chatbot for an e-commerce site"
Via Web UI
boba serve- Click "New Dataset" → "Generate with AI"
- Enter a description of your agent
Via API
from simboba import Boba
boba = Boba()
boba.run(agent, dataset="my-dataset") # Uses dataset created above
Test Fixtures (setup.py)
Edit boba-evals/setup.py to create test data your agent needs:
def get_context():
"""Create test fixtures, return context dict."""
user = create_test_user(email="eval@test.com")
return {
"user_id": user.id,
"api_token": user.generate_token(),
}
def cleanup():
"""Clean up test data after evals."""
delete_test_users()
Environment Variables
Boba loads .env automatically. Set your LLM API key for judging (Claude Haiku 4.5 is the default):
ANTHROPIC_API_KEY=sk-ant-... # Required for default model (Claude)
OPENAI_API_KEY=sk-... # For OpenAI models
GEMINI_API_KEY=... # For Gemini models
Note: Without an API key, boba falls back to a simple keyword-matching judge which is less accurate.
Project Structure
your-project/
├── boba-evals/
│ ├── setup.py # Test fixtures
│ ├── test_chat.py # Your eval script
│ ├── .boba.yaml # Config (docker vs local)
│ └── simboba.db # Results database
└── ...
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simboba-0.1.2.tar.gz.
File metadata
- Download URL: simboba-0.1.2.tar.gz
- Upload date:
- Size: 47.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0339476968a896660cfa092f5ef77d7a300caa7a58387975d3bcfd4d3df305e1
|
|
| MD5 |
98faa9609be316834705b235dea6fb0d
|
|
| BLAKE2b-256 |
a10214eb41623a8c561ad4a18ab13dfcb0ffadd74e70a727d20e966b0e580c0d
|
Provenance
The following attestation bundles were made for simboba-0.1.2.tar.gz:
Publisher:
publish.yml on ntkris/simboba
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simboba-0.1.2.tar.gz -
Subject digest:
0339476968a896660cfa092f5ef77d7a300caa7a58387975d3bcfd4d3df305e1 - Sigstore transparency entry: 774627685
- Sigstore integration time:
-
Permalink:
ntkris/simboba@5f69d4a3b3044d414d454f5084328a3a0cf66c6f -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/ntkris
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f69d4a3b3044d414d454f5084328a3a0cf66c6f -
Trigger Event:
push
-
Statement type:
File details
Details for the file simboba-0.1.2-py3-none-any.whl.
File metadata
- Download URL: simboba-0.1.2-py3-none-any.whl
- Upload date:
- Size: 48.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fde879db884cc15ae195f90e3b012376d19d40db21a3ecf7f4d86f2d1f69e48e
|
|
| MD5 |
0b06111d2ed33256d9c9cd0010cff602
|
|
| BLAKE2b-256 |
670f0be8a0fb8dc5b12f7534569e68fa1ef99b3f68245e1d7b13a50dbe32dcbb
|
Provenance
The following attestation bundles were made for simboba-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on ntkris/simboba
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simboba-0.1.2-py3-none-any.whl -
Subject digest:
fde879db884cc15ae195f90e3b012376d19d40db21a3ecf7f4d86f2d1f69e48e - Sigstore transparency entry: 774627686
- Sigstore integration time:
-
Permalink:
ntkris/simboba@5f69d4a3b3044d414d454f5084328a3a0cf66c6f -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/ntkris
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5f69d4a3b3044d414d454f5084328a3a0cf66c6f -
Trigger Event:
push
-
Statement type: