A lightweight tool for generating annotated eval datasets and running LLM-as-judge evaluations

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

simboba

     ( )
   .-~~~-.
  /       \
  |  ===  |
  | ::::: |
  |_:::::_|
    '---'

Lightweight eval tracking with LLM-as-judge. Run evals as Python scripts, track results in a web UI.

Installation

pip install simboba

Quick Start

boba init          # Create boba-evals/ folder with templates
boba magic         # Print AI prompt to help configure your evals
boba run           # Run your evals (handles Docker automatically)
boba serve         # View results at http://localhost:8787

Commands

Command	Description
`boba init`	Create `boba-evals/` folder with starter templates
`boba magic`	Print detailed AI prompt to configure your eval scripts
`boba setup`	Print basic setup instructions
`boba run [script]`	Run eval script (default: `test_chat.py`). Handles Docker automatically
`boba serve`	Start web UI to view results
`boba datasets`	List all datasets
`boba generate "description"`	Generate a dataset from a description
`boba reset`	Delete database

Writing Evals

Evals are Python scripts. Edit boba-evals/test_chat.py:

from simboba import Boba
from setup import get_context, cleanup

boba = Boba()

def agent(message: str) -> str:
    """Call your agent and return its response."""
    ctx = get_context()
    response = requests.post(
        "http://localhost:8000/api/chat",
        json={"user_id": ctx["user_id"], "message": message},
    )
    return response.json()["response"]

if __name__ == "__main__":
    try:
        # Option 1: Single eval
        boba.eval(
            input="Hello",
            output=agent("Hello"),
            expected="Should greet the user",
        )

        # Option 2: Run against a dataset
        # boba.run(agent, dataset="my-dataset")

        print("Done! Run 'boba serve' to view results.")
    finally:
        cleanup()

Creating Datasets

Via CLI

boba generate "A customer support chatbot for an e-commerce site"

Via Web UI

boba serve
Click "New Dataset" → "Generate with AI"
Enter a description of your agent

Via API

from simboba import Boba
boba = Boba()
boba.run(agent, dataset="my-dataset")  # Uses dataset created above

Test Fixtures (setup.py)

Edit boba-evals/setup.py to create test data your agent needs:

def get_context():
    """Create test fixtures, return context dict."""
    user = create_test_user(email="eval@test.com")
    return {
        "user_id": user.id,
        "api_token": user.generate_token(),
    }

def cleanup():
    """Clean up test data after evals."""
    delete_test_users()

Environment Variables

Boba loads .env automatically. Set your LLM API key for judging (Claude Haiku 4.5 is the default):

ANTHROPIC_API_KEY=sk-ant-...   # Required for default model (Claude)
OPENAI_API_KEY=sk-...          # For OpenAI models
GEMINI_API_KEY=...             # For Gemini models

Note: Without an API key, boba falls back to a simple keyword-matching judge which is less accurate.

Project Structure

your-project/
├── boba-evals/
│   ├── setup.py        # Test fixtures
│   ├── test_chat.py    # Your eval script
│   ├── .boba.yaml      # Config (docker vs local)
│   └── simboba.db      # Results database
└── ...

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ntkris

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Feb 17, 2026

0.1.8

Jan 26, 2026

0.1.7

Dec 29, 2025

0.1.6

Dec 27, 2025

0.1.5

Dec 23, 2025

0.1.4

Dec 22, 2025

0.1.3

Dec 22, 2025

This version

0.1.2

Dec 21, 2025

0.1.1

Dec 21, 2025

0.1.0

Dec 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simboba-0.1.2.tar.gz (47.0 kB view details)

Uploaded Dec 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simboba-0.1.2-py3-none-any.whl (48.2 kB view details)

Uploaded Dec 21, 2025 Python 3

File details

Details for the file simboba-0.1.2.tar.gz.

File metadata

Download URL: simboba-0.1.2.tar.gz
Upload date: Dec 21, 2025
Size: 47.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simboba-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`0339476968a896660cfa092f5ef77d7a300caa7a58387975d3bcfd4d3df305e1`
MD5	`98faa9609be316834705b235dea6fb0d`
BLAKE2b-256	`a10214eb41623a8c561ad4a18ab13dfcb0ffadd74e70a727d20e966b0e580c0d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for simboba-0.1.2.tar.gz:

Publisher: publish.yml on ntkris/simboba

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: simboba-0.1.2.tar.gz
- Subject digest: 0339476968a896660cfa092f5ef77d7a300caa7a58387975d3bcfd4d3df305e1
- Sigstore transparency entry: 774627685
- Sigstore integration time: Dec 21, 2025
Source repository:
- Permalink: ntkris/simboba@5f69d4a3b3044d414d454f5084328a3a0cf66c6f
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ntkris
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5f69d4a3b3044d414d454f5084328a3a0cf66c6f
- Trigger Event: push

File details

Details for the file simboba-0.1.2-py3-none-any.whl.

File metadata

Download URL: simboba-0.1.2-py3-none-any.whl
Upload date: Dec 21, 2025
Size: 48.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for simboba-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fde879db884cc15ae195f90e3b012376d19d40db21a3ecf7f4d86f2d1f69e48e`
MD5	`0b06111d2ed33256d9c9cd0010cff602`
BLAKE2b-256	`670f0be8a0fb8dc5b12f7534569e68fa1ef99b3f68245e1d7b13a50dbe32dcbb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for simboba-0.1.2-py3-none-any.whl:

Publisher: publish.yml on ntkris/simboba

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: simboba-0.1.2-py3-none-any.whl
- Subject digest: fde879db884cc15ae195f90e3b012376d19d40db21a3ecf7f4d86f2d1f69e48e
- Sigstore transparency entry: 774627686
- Sigstore integration time: Dec 21, 2025
Source repository:
- Permalink: ntkris/simboba@5f69d4a3b3044d414d454f5084328a3a0cf66c6f
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/ntkris
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@5f69d4a3b3044d414d454f5084328a3a0cf66c6f
- Trigger Event: push

simboba 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

simboba

Installation

Quick Start

Commands

Writing Evals

Creating Datasets

Via CLI

Via Web UI

Via API

Test Fixtures (setup.py)

Environment Variables

Project Structure

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance