A lightweight tool for generating annotated eval datasets and running LLM-as-judge evaluations
Project description
simboba
( )
.-~~~-.
/ \
| === |
| ::::: |
|_:::::_|
'---'
Lightweight eval tracking with LLM-as-judge. Run evals as Python scripts, track results as git-friendly JSON files, view in a web UI. Designed for 1-click setup with your favourite AI coding tool.
Installation
pip install simboba
Quick Start
boba init # Create boba-evals/ folder with templates
boba magic # Prompt for your AI tool to set up and run your first eval
boba run # Run your evals (handles Docker automatically)
boba baseline # Save run as baseline for regression detection
boba serve # View results at http://localhost:8787
Commands
| Command | Description |
|---|---|
boba init |
Create boba-evals/ folder with starter templates |
boba magic |
Print detailed prompt for AI coding assistant |
boba run [script] |
Run eval script (default: test_chat.py). Handles Docker automatically |
boba baseline |
Save a run as baseline for regression detection |
boba serve |
Start web UI to view results |
boba datasets |
List all datasets |
boba generate "description" |
Generate a dataset from a description |
boba reset |
Clear run history (keeps datasets and baselines) |
Writing Evals
Evals are Python scripts. Edit boba-evals/test_chat.py:
from simboba import Boba
from setup import get_context, cleanup
boba = Boba()
def agent(message: str) -> str:
"""Call your agent and return its response."""
ctx = get_context()
response = requests.post(
"http://localhost:8000/api/chat",
json={"user_id": ctx["user_id"], "message": message},
)
return response.json()["response"]
if __name__ == "__main__":
try:
# Option 1: Single eval
boba.eval(
input="Hello",
output=agent("Hello"),
expected="Should greet the user",
)
# Option 2: Run against a dataset
# boba.run(agent, dataset="my-dataset")
print("Done! Run 'boba serve' to view results.")
finally:
cleanup()
Regression Detection
Track regressions across code changes:
# Run evals and compare to baseline
boba run
# Output shows regressions: "REGRESSIONS: 2 cases now failing"
# Save current results as new baseline
boba baseline
# Commit to git for tracking
git add boba-evals/baselines/
git commit -m "Update eval baseline"
Creating Datasets
Via CLI
boba generate "A customer support chatbot for an e-commerce site"
Via Web UI
boba serve- Click "New Dataset" -> "Generate with AI"
- Enter a description of your agent and we'll create test cases for you.
Via API
from simboba import Boba
boba = Boba()
boba.run(agent, dataset="my-dataset") # Uses dataset created above
Test Fixtures (setup.py)
Edit boba-evals/setup.py to create test data your agent needs:
def get_context():
"""Create test fixtures, return context dict."""
user = create_test_user(email="eval@test.com")
return {
"user_id": user.id,
"api_token": user.generate_token(),
}
def cleanup():
"""Clean up test data after evals."""
delete_test_users()
Environment Variables
Boba loads .env automatically. Set your LLM API key for judging (Claude Haiku 4.5 is the default):
ANTHROPIC_API_KEY=sk-ant-... # Required for default model (Claude)
OPENAI_API_KEY=sk-... # For OpenAI models
GEMINI_API_KEY=... # For Gemini models
Note: Without an API key, boba falls back to a simple keyword-matching judge which is less accurate.
Project Structure
your-project/
├── boba-evals/
│ ├── datasets/ # Dataset JSON files (git tracked)
│ ├── baselines/ # Baseline results (git tracked)
│ ├── runs/ # Run history (gitignored)
│ ├── files/ # Uploaded attachments
│ ├── setup.py # Test fixtures
│ ├── test_chat.py # Your eval script
│ ├── settings.json # Configuration
│ └── .boba.yaml # Runtime config (docker vs local)
└── ...
Future Updates
- File Uploads - Allow uploads via UI to help create datasets
- Eval methods - Built-in evaluation strategies beyond LLM-as-judge
- Cloud storage - Sync datasets and runs to the cloud for team collaboration
Development
To work on the web UI:
cd frontend
npm install
npm run dev # Dev server with hot reload (proxies to localhost:8787)
npm run build # Build to simboba/static/
Run boba serve in another terminal to start the backend.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file simboba-0.1.3.tar.gz.
File metadata
- Download URL: simboba-0.1.3.tar.gz
- Upload date:
- Size: 200.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ddee71496af05a0140de7b929a8c6c1c415573617401e69a52b23ee94c5270d
|
|
| MD5 |
bd721cb67c76bc2ddb89333c4ac5b908
|
|
| BLAKE2b-256 |
040bad6e0cf9c78a3015afcc6808a9f85b4c3a8a1a872a396b9bb5a148601164
|
Provenance
The following attestation bundles were made for simboba-0.1.3.tar.gz:
Publisher:
publish.yml on ntkris/simboba
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simboba-0.1.3.tar.gz -
Subject digest:
2ddee71496af05a0140de7b929a8c6c1c415573617401e69a52b23ee94c5270d - Sigstore transparency entry: 775740850
- Sigstore integration time:
-
Permalink:
ntkris/simboba@cbe30c70121247225464f73da28a6ee7e07f7963 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ntkris
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cbe30c70121247225464f73da28a6ee7e07f7963 -
Trigger Event:
push
-
Statement type:
File details
Details for the file simboba-0.1.3-py3-none-any.whl.
File metadata
- Download URL: simboba-0.1.3-py3-none-any.whl
- Upload date:
- Size: 144.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86dea59e0917217d0d25afe7b012d80ab5e4cdc4ea3c42dca952fe985a3adb53
|
|
| MD5 |
1d7a5bc3dc34b4228e87be93af64dfe1
|
|
| BLAKE2b-256 |
d8a67f5571f8ef63aa046ed774111904f0db89dbf7614e5a6b3cb20c9c117ef0
|
Provenance
The following attestation bundles were made for simboba-0.1.3-py3-none-any.whl:
Publisher:
publish.yml on ntkris/simboba
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
simboba-0.1.3-py3-none-any.whl -
Subject digest:
86dea59e0917217d0d25afe7b012d80ab5e4cdc4ea3c42dca952fe985a3adb53 - Sigstore transparency entry: 775740851
- Sigstore integration time:
-
Permalink:
ntkris/simboba@cbe30c70121247225464f73da28a6ee7e07f7963 -
Branch / Tag:
refs/tags/v0.1.3 - Owner: https://github.com/ntkris
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@cbe30c70121247225464f73da28a6ee7e07f7963 -
Trigger Event:
push
-
Statement type: