Open-source LLM A/B testing SDK — run experiments across models, track cost and conversion in production
Project description
Coii
Open-source LLM A/B testing with real business outcomes. Run experiments across models, track cost and conversion in production, and get a plain-English recommendation.
┌───────────────────────┬─────────────┬────────────────┬─────────────┐
│ │ GPT-4o │ Claude Sonnet │ Gemini Flash│
│ │ (current) │ (challenger) │ (challenger)│
├───────────────────────┼─────────────┼────────────────┼─────────────┤
│ Users │ 1,441 │ 481 │ 480 │
│ Ticket resolution │ 72% │ 78% ✓ +8.3% │ 65% │
│ Avg cost / request │ $0.0034 │ $0.0041 │ $0.0008 │
├───────────────────────┴─────────────┴────────────────┴─────────────┤
│ Switch to Claude Sonnet: +8.3% resolution rate (p=0.02) │
│ Net impact: save $2,079/month │
└─────────────────────────────────────────────────────────────────────┘
Getting Started
1. Install
# Server
cd server
uv pip install -e .
uv run coii serve
# → Dashboard http://localhost:8080
# → API docs http://localhost:8080/docs
# SDK (separate terminal, in your app's virtualenv)
pip install coii-sdk
# or for local development: cd sdk && uv pip install -e .
2. Set up an experiment in the dashboard
- Open
http://localhost:8080→ New Experiment - Fill in:
- Current model — your production model (e.g.
openai / gpt-4o, traffic 60%) - Challengers — models to test (e.g.
anthropic / claude-sonnet-4-6, 20%;google / gemini-2.5-flash, 20%) - Outcome events — the business signals you care about (e.g.
ticket_resolved,purchase)
- Current model — your production model (e.g.
- Click Start — the experiment is now live and assigning users to variants
3. Instrument your code
from coii import Coii
import openai
coii = Coii(host="http://localhost:8080")
client = openai.OpenAI()
coii.instrument(client) # auto-tracks latency, tokens, cost
def handle(user_id: str, message: str) -> str:
ctx = coii.start(user_id) # assigns user to a variant
resp = client.chat.completions.create(
model=ctx.model, # the assigned model — "gpt-4o", "claude-sonnet-4-6", etc.
messages=[{"role": "user", "content": message}],
)
return resp.choices[0].message.content
def on_ticket_resolved(user_id: str):
coii.outcome(user_id, "ticket_resolved") # ties the outcome back to the variant
4. Read the results
Once you have enough traffic, open the experiment detail page in the dashboard. It shows per-variant conversion rates, cost, latency, statistical significance, and a plain-English recommendation with net ROI in dollars.
Frontend dev server (optional)
The dashboard is embedded in the server binary. If you want to iterate on the frontend:
cd frontend
npm install && npm run dev # http://localhost:5173 — proxies API to :8080
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file coii_sdk-0.1.1.tar.gz.
File metadata
- Download URL: coii_sdk-0.1.1.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f157078a5ed03f4a4d116dad250920e0086e5b4501d6fa714e81cead326f754
|
|
| MD5 |
82410b6f32bda2808a14c1783021fee8
|
|
| BLAKE2b-256 |
beaf0fe8bdc56b9ed4dd754546bb5b2069b408de26bea9832f4637598d7daf0b
|
File details
Details for the file coii_sdk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: coii_sdk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2c1a65c28e13f81f59bc661bed3ace5cd9cbb8421fbcbcbf1c770b8ab73b8b3e
|
|
| MD5 |
c9606f4130e27dd15712a1ea8a20f794
|
|
| BLAKE2b-256 |
1855be32c7d24dbc7d7477747aed8fe4390f77169cf82feb369f067a1d5660eb
|