Skip to main content

A lightweight tool for generating annotated eval datasets and running LLM-as-judge evaluations

Project description

simboba

     ( )
   .-~~~-.
  /       \
  |  ===  |
  | ::::: |
  |_:::::_|
    '---'

Lightweight eval tracking with LLM-as-judge. Run evals as Python scripts, track results in a web UI.

Installation

pip install simboba

Quick Start

boba init          # Create boba-evals/ folder with templates
boba magic         # Print AI prompt to help configure your evals
boba run           # Run your evals (handles Docker automatically)
boba serve         # View results at http://localhost:8787

Commands

Command Description
boba init Create boba-evals/ folder with starter templates
boba magic Print detailed AI prompt to configure your eval scripts
boba setup Print basic setup instructions
boba run [script] Run eval script (default: test_chat.py). Handles Docker automatically
boba serve Start web UI to view results
boba datasets List all datasets
boba generate "description" Generate a dataset from a description
boba reset Delete database

Writing Evals

Evals are Python scripts. Edit boba-evals/test_chat.py:

from simboba import Boba
from setup import get_context, cleanup

boba = Boba()

def agent(message: str) -> str:
    """Call your agent and return its response."""
    ctx = get_context()
    response = requests.post(
        "http://localhost:8000/api/chat",
        json={"user_id": ctx["user_id"], "message": message},
    )
    return response.json()["response"]

if __name__ == "__main__":
    try:
        # Option 1: Single eval
        boba.eval(
            input="Hello",
            output=agent("Hello"),
            expected="Should greet the user",
        )

        # Option 2: Run against a dataset
        # boba.run(agent, dataset="my-dataset")

        print("Done! Run 'boba serve' to view results.")
    finally:
        cleanup()

Creating Datasets

Via CLI

boba generate "A customer support chatbot for an e-commerce site"

Via Web UI

  1. boba serve
  2. Click "New Dataset" → "Generate with AI"
  3. Enter a description of your agent

Via API

from simboba import Boba
boba = Boba()
boba.run(agent, dataset="my-dataset")  # Uses dataset created above

Test Fixtures (setup.py)

Edit boba-evals/setup.py to create test data your agent needs:

def get_context():
    """Create test fixtures, return context dict."""
    user = create_test_user(email="eval@test.com")
    return {
        "user_id": user.id,
        "api_token": user.generate_token(),
    }

def cleanup():
    """Clean up test data after evals."""
    delete_test_users()

Environment Variables

Boba loads .env automatically. Set your LLM API key for judging (Claude 3.5 Haiku is the default):

ANTHROPIC_API_KEY=sk-ant-...   # Required for default model (Claude)
OPENAI_API_KEY=sk-...          # For OpenAI models
GEMINI_API_KEY=...             # For Gemini models

Note: Without an API key, boba falls back to a simple keyword-matching judge which is less accurate.

Project Structure

your-project/
├── boba-evals/
│   ├── setup.py        # Test fixtures
│   ├── test_chat.py    # Your eval script
│   ├── .boba.yaml      # Config (docker vs local)
│   └── simboba.db      # Results database
└── ...

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simboba-0.1.1.tar.gz (46.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simboba-0.1.1-py3-none-any.whl (47.7 kB view details)

Uploaded Python 3

File details

Details for the file simboba-0.1.1.tar.gz.

File metadata

  • Download URL: simboba-0.1.1.tar.gz
  • Upload date:
  • Size: 46.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simboba-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a69f4b68840e0164e40e8280e8f6a891c69a447e23505c8bff27e6114076205a
MD5 201b0f7942c8dcd3eb962780a7ae147f
BLAKE2b-256 f5c67f42ec4a743ae90de57f610b44356e3ffd22d3bbbf63df01bcd5e6fd4771

See more details on using hashes here.

Provenance

The following attestation bundles were made for simboba-0.1.1.tar.gz:

Publisher: publish.yml on ntkris/simboba

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file simboba-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: simboba-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 47.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simboba-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 66cfa5e5ed074b90e32fce1e2e5bbe4e130e15177d013f9e4c557a3101279bc3
MD5 d157c4fc52443e4fdb23847bd50f3c0b
BLAKE2b-256 a0301d0c28a677429a6443218061ecf09d3a6b438af291a38cb864836d5b53b1

See more details on using hashes here.

Provenance

The following attestation bundles were made for simboba-0.1.1-py3-none-any.whl:

Publisher: publish.yml on ntkris/simboba

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page