Skip to main content

Open-source LLM A/B testing SDK — run experiments across models, track cost and conversion in production

Project description

Coii

Open-source LLM A/B testing with real business outcomes. Run experiments across models, track cost and conversion in production, and get a plain-English recommendation.

┌───────────────────────┬─────────────┬────────────────┬─────────────┐
│                       │ GPT-4o      │ Claude Sonnet  │ Gemini Flash│
│                       │ (current)   │ (challenger)   │ (challenger)│
├───────────────────────┼─────────────┼────────────────┼─────────────┤
│ Users                 │ 1,441       │ 481            │ 480         │
│ Ticket resolution     │ 72%         │ 78% ✓ +8.3%   │ 65%         │
│ Avg cost / request    │ $0.0034     │ $0.0041        │ $0.0008     │
├───────────────────────┴─────────────┴────────────────┴─────────────┤
│ Switch to Claude Sonnet: +8.3% resolution rate (p=0.02)            │
│ Net impact: save $2,079/month                                       │
└─────────────────────────────────────────────────────────────────────┘

Getting Started

1. Install

# Server
cd server
uv pip install -e .
uv run coii serve
# → Dashboard  http://localhost:8080
# → API docs   http://localhost:8080/docs
# SDK (separate terminal, in your app's virtualenv)
pip install coii-sdk
# or for local development: cd sdk && uv pip install -e .

2. Set up an experiment in the dashboard

  1. Open http://localhost:8080New Experiment
  2. Fill in:
    • Current model — your production model (e.g. openai / gpt-4o, traffic 60%)
    • Challengers — models to test (e.g. anthropic / claude-sonnet-4-6, 20%; google / gemini-2.5-flash, 20%)
    • Outcome events — the business signals you care about (e.g. ticket_resolved, purchase)
  3. Click Start — the experiment is now live and assigning users to variants

3. Instrument your code

from coii import Coii
import openai

coii = Coii(host="http://localhost:8080")
client = openai.OpenAI()
coii.instrument(client)          # auto-tracks latency, tokens, cost

def handle(user_id: str, message: str) -> str:
    ctx = coii.start(user_id)    # assigns user to a variant
    resp = client.chat.completions.create(
        model=ctx.model,         # the assigned model — "gpt-4o", "claude-sonnet-4-6", etc.
        messages=[{"role": "user", "content": message}],
    )
    return resp.choices[0].message.content

def on_ticket_resolved(user_id: str):
    coii.outcome(user_id, "ticket_resolved")   # ties the outcome back to the variant

4. Read the results

Once you have enough traffic, open the experiment detail page in the dashboard. It shows per-variant conversion rates, cost, latency, statistical significance, and a plain-English recommendation with net ROI in dollars.


Frontend dev server (optional)

The dashboard is embedded in the server binary. If you want to iterate on the frontend:

cd frontend
npm install && npm run dev   # http://localhost:5173 — proxies API to :8080

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

coii_sdk-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

coii_sdk-0.1.1-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file coii_sdk-0.1.1.tar.gz.

File metadata

  • Download URL: coii_sdk-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for coii_sdk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4f157078a5ed03f4a4d116dad250920e0086e5b4501d6fa714e81cead326f754
MD5 82410b6f32bda2808a14c1783021fee8
BLAKE2b-256 beaf0fe8bdc56b9ed4dd754546bb5b2069b408de26bea9832f4637598d7daf0b

See more details on using hashes here.

File details

Details for the file coii_sdk-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: coii_sdk-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for coii_sdk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2c1a65c28e13f81f59bc661bed3ace5cd9cbb8421fbcbcbf1c770b8ab73b8b3e
MD5 c9606f4130e27dd15712a1ea8a20f794
BLAKE2b-256 1855be32c7d24dbc7d7477747aed8fe4390f77169cf82feb369f067a1d5660eb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page