Skip to main content

Auto-distillation layer for your LLM calls — drop-in OpenAI-compatible SDK that turns every request into training data for cheaper custom models.

Project description

OpenTracy

The auto-distillation layer for your LLM calls.

License: MIT Python 3.10+ PyPI

Open In Colab   Join Discord   Documentation

Drop-in OpenAI-compatible SDK. Every request becomes a trace; traces become datasets; datasets become distilled custom models; the routing layer swaps those models in under your app via aliases — so your cost curve goes down over time without code changes.

Sponsors

Sharpi

Try it in Colab (no install)

Each notebook runs end-to-end on a free Colab runtime — bring your own OpenAI key, optionally Anthropic / Groq.

# Notebook One-line pitch Colab
01 Quickstart First completion() call, see _cost + _latency_ms, swap providers Open In Colab
02 Drop in over the OpenAI SDK Keep from openai import OpenAI, change only base_url Open In Colab
03 Semantic auto-routing One prompt, the right model of 13 — learned, not rule-based Open In Colab
04 Ticket classifier (real app) End-to-end support-ticket classifier with cost breakdown Open In Colab
05 Distillation — train your student Turn trace history into a distilled tiny model Open In Colab
06 Serve your distilled model Four serving paths from load-the-adapter to alias swap Open In Colab

Colab heads-up — traces only show up in the dashboard if you set OPENTRACY_ENGINE_URL before import opentracy. Every notebook has a commented-out cell at the top with the two lines you need.

Install

pip install opentracy

Quick start

import opentracy as lr

resp = lr.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
print(resp.choices[0].message.content)
print(f"cost: ${resp._cost:.6f}  latency: {resp._latency_ms:.0f}ms")

Works with 13 providers out of the box: OpenAI, Anthropic, Gemini, Groq, Mistral, DeepSeek, Together, Fireworks, Cerebras, Sambanova, Perplexity, Cohere, Bedrock.

Connecting to the OpenTracy platform (traces, dashboards, distillation)

By default lr.completion() goes direct to the provider, so calls do not appear in the OpenTracy dashboard. To route every call through a running engine — the only way traces, metrics, and the distillation loop get data — set OPENTRACY_ENGINE_URL before importing the SDK:

import os
os.environ["OPENTRACY_ENGINE_URL"] = "http://<your-opentracy-host>:8080"  # engine port
import opentracy as lr

resp = lr.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)
# trace now visible in the dashboard on the UI host at :3000

Alternatives:

  • Per-call: pass force_engine=True, api_base="http://<host>:8080/v1" to lr.completion(...).
  • Drop-in OpenAI SDK (no code change beyond base_url — see below).

API keys for your providers should be saved once via the UI (Settings → API Keys) or the API (POST /v1/secrets/<provider>); the engine picks them up immediately from ~/.opentracy/secrets.json.

Routing with fallbacks

router = lr.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-6"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
)
resp = router.completion(model="smart", messages=[{"role": "user", "content": "Hi"}])

Drop-in replacement for the OpenAI SDK

Point any existing OpenAI app at the OpenTracy engine — zero code changes beyond base_url:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="any")
# All 13 providers routed through the OpenTracy engine; every request is a trace.

Distillation — what makes OpenTracy different from a plain gateway

from opentracy import Distiller

d = Distiller()
# Submit a dataset built from your own traces, pick a teacher + student model,
# and OpenTracy trains the distilled model and serves it behind a routing alias
# you can point traffic at. Your app code never changes.

Install the training extras for the distillation pipeline:

pip install opentracy[distill]

Self-host the full platform (traces + UI + REST API)

git clone https://github.com/OpenTracy/opentracy.git
cd opentracy
make start-full   # Gateway + ClickHouse analytics + Python API + UI

Engine at http://localhost:8080, Python API at http://localhost:8000, UI at http://localhost:3000.

GPU is optional

The Python API container runs on CPU by default — no nvidia-container-toolkit required, so it works on plain cloud VMs and laptops without a GPU. Local training and inference paths fall back to CPU automatically.

To enable GPU acceleration (faster distillation training, local inference):

DOCKER_RUNTIME=nvidia docker compose up -d

This requires the NVIDIA Container Toolkit on the host. Persist the variable in .env next to docker-compose.yml if you always run with GPU.

What OpenTracy Does

Requests ──► Gateway (13 providers) ──► Traces (ClickHouse)
                                            │
                                    ┌───────┴───────┐
                                    ▼               ▼
                              Clustering        Analytics
                              (domains)        (cost/latency)
                                    │
                              ┌─────┴─────┐
                              ▼           ▼
                         Evaluations   Distillation
                        (AI metrics)  (training data)
  1. Route — proxy to 13 LLM providers with fallbacks, retries, and cost tracking
  2. Observe — every request/response stored in ClickHouse with full content
  3. Cluster — auto-group prompts by domain using embeddings + LLM labeling
  4. Evaluate — run models against domain datasets with built-in and AI-suggested metrics
  5. Distill — export input/output pairs per domain for fine-tuning smaller models

Features

Gateway

  • 13 LLM Providers through one OpenAI-compatible API
  • Python SDKlr.completion() one-liner
  • Router Class — load balancing, fallbacks, retries, 4 strategies
  • Streaming — all providers including Anthropic & Bedrock SSE translation
  • Cost Tracking — 70+ models with per-token pricing on every response
  • Vision / Multimodal — images via base64 or URL
  • Tool Calling — function calls with cross-provider translation
  • Semantic Routing — auto-select the best model per prompt (with weights)

Observability

  • ClickHouse Analytics — traces, cost, latency, model-level stats
  • Full Content Capture — input/output text stored for every request
  • Trace Scanning — AI agent detects hallucinations, refusals, quality regressions
  • Real-time Dashboard — UI with filters, search, trace detail drawer

Domain Clustering

  • Auto-clustering — groups prompts by semantic similarity (KMeans + MiniLM embeddings)
  • LLM Labeling — AI agent names each cluster (e.g., "JavaScript Concepts", "Business Strategy")
  • Quality Gates — coherence scoring, outlier detection, merge suggestions
  • Input + Output Storage — full pairs stored per cluster for distillation

Evaluations

  • Run Evaluations — send dataset samples through models, score and compare
  • 6 Built-in Metrics — exact match, contains, similarity, LLM-as-judge, latency, cost
  • AI Metric Suggestion — harness agent analyzes dataset domain and creates tailored metrics
  • Background Execution — evaluations run async with progress tracking
  • Model Comparison — side-by-side results with winner determination

Distillation

  • BOND Pipeline — teacher → LLM-as-Judge curation → LoRA training (Unsloth) → GGUF export
  • Dataset Support — use domain clusters or custom datasets as training source
  • UI + API — create and monitor jobs via dashboard or REST endpoints

Harness (AI Agent System)

  • Agent Runner — loads .md agent configs, calls LLM, parses structured output
  • 7 Agents — cluster labeler, coherence scorer, outlier detector, merge checker, trace scanner, eval generator, metrics suggester
  • Memory Layer — persistent agent memory with query/summary
  • Tool Access — agents can call tools (list traces, query datasets, etc.)

Supported Providers

Provider Syntax Env Var
OpenAI openai/gpt-4o-mini OPENAI_API_KEY
Anthropic anthropic/claude-haiku-4-5-20251001 ANTHROPIC_API_KEY
Gemini gemini/gemini-2.0-flash GEMINI_API_KEY
Mistral mistral/mistral-small-latest MISTRAL_API_KEY
Groq groq/llama-3.3-70b-versatile GROQ_API_KEY
DeepSeek deepseek/deepseek-chat DEEPSEEK_API_KEY
Perplexity perplexity/sonar PERPLEXITY_API_KEY
Cerebras cerebras/llama3.1-70b CEREBRAS_API_KEY
SambaNova sambanova/Meta-Llama-3.1-70B-Instruct SAMBANOVA_API_KEY
Together together/meta-llama/Llama-3.3-70B-Instruct-Turbo TOGETHER_API_KEY
Fireworks fireworks/accounts/fireworks/models/llama-v3p1-70b-instruct FIREWORKS_API_KEY
Cohere cohere/command-r-plus COHERE_API_KEY
AWS Bedrock bedrock/amazon.titan-text-express-v1 AWS_ACCESS_KEY_ID + AWS_SECRET_ACCESS_KEY

Installation

pip install -e ".[openai,anthropic,api]"   # SDK + common providers
pip install -e ".[all]"                     # everything
pip install -e ".[train]"                   # training/distillation deps (CUDA)

Python SDK

Completion

import opentracy as lr

response = lr.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)

# Streaming
for chunk in lr.completion(model="openai/gpt-4o-mini", messages=[...], stream=True):
    print(chunk.choices[0].delta.content or "", end="")

# Fallbacks
response = lr.completion(
    model="openai/gpt-4o-mini",
    messages=[...],
    fallbacks=["anthropic/claude-haiku-4-5-20251001", "groq/llama-3.3-70b-versatile"],
    num_retries=2,
)

Router (Load Balancing)

router = lr.Router(
    model_list=[
        {"model_name": "smart", "model": "openai/gpt-4o"},
        {"model_name": "smart", "model": "anthropic/claude-sonnet-4-20250514"},
        {"model_name": "fast",  "model": "groq/llama-3.3-70b-versatile"},
    ],
    fallbacks=[{"smart": ["deepseek/deepseek-chat"]}],
    strategy="round-robin",  # or: least-cost, lowest-latency, weighted-random
)

response = router.completion(model="smart", messages=[...])

Drop-in OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8080/v1", api_key="any")
client.chat.completions.create(model="openai/gpt-4o-mini", messages=[...])
client.chat.completions.create(model="anthropic/claude-haiku-4-5-20251001", messages=[...])
client.chat.completions.create(model="mistral/mistral-small-latest", messages=[...])

Running

Command What Requires
make start Gateway proxy (no weights needed) Go
make start-full Gateway + ClickHouse + Python API Go + Docker
make start-router Full semantic routing (model="auto") Go + weights
make dev-python Python API only (uvicorn --reload) Python

API Keys

Configure via the UI, environment variables, or ~/.opentracy/secrets.json:

export OPENAI_API_KEY=sk-...
export ANTHROPIC_API_KEY=sk-ant-...
make start-full

API Endpoints

Gateway (Go Engine — port 8080)

Method Endpoint Description
POST /v1/chat/completions Chat completion (any provider)
POST /v1/route Route a prompt without generating
GET /v1/models List registered models
GET /health Health check

Python API (port 8000)

Method Endpoint Description
Analytics
GET /v1/stats/{tenant}/analytics Full analytics (traces, cost, latency, distributions)
Clustering
POST /v1/clustering/run Run clustering pipeline (embed, cluster, label)
GET /v1/clustering/datasets List domain datasets from latest run
GET /v1/clustering/datasets/{run}/{cluster} Get traces for a cluster
Datasets
GET /v1/datasets List all datasets (eval + domain clusters)
POST /v1/datasets Create evaluation dataset
POST /v1/datasets/{id}/samples Add samples to dataset
Evaluations
POST /v1/evaluations Create and run evaluation (async)
GET /v1/evaluations List evaluations
GET /v1/evaluations/{id}/status Evaluation progress
GET /v1/evaluations/{id}/results Evaluation results with scores
Distillation
POST /v1/distillation/{tenant}/jobs Create distillation job
GET /v1/distillation/{tenant}/jobs List distillation jobs
GET /v1/distillation/{tenant}/jobs/{id} Get job status and results
Metrics
GET /v1/metrics List built-in + custom metrics
POST /v1/metrics Create custom metric
POST /v1/auto-eval/suggest-metrics AI-powered metric suggestion
Models
GET /v1/models/available Models available from configured providers
Harness
GET /v1/harness/agents List AI agents
POST /v1/harness/run/{name} Run an agent with input
GET /v1/harness/memory Query agent memory
Secrets
GET /v1/secrets List configured providers
POST /v1/secrets/{provider} Save API key
GET /v1/harness/agents List AI agents
POST /v1/harness/run/{name} Run an agent with input
GET /v1/harness/memory Query agent memory
Secrets
GET /v1/secrets List configured providers
POST /v1/secrets/{provider} Save API key

Architecture

go/                              # Go engine (high-performance gateway)
├── cmd/opentracy-engine/            # Entry point
├── internal/
│   ├── provider/                # 13 providers
│   ├── server/                  # HTTP handlers + session management
│   ├── clickhouse/              # Trace writer + 8 migrations
│   ├── router/                  # UniRoute algorithm + LRU cache
│   └── embeddings/              # ONNX MiniLM embedder

opentracy/                    # Python layer (analytics, clustering, evals)
├── api/server.py                # FastAPI — analytics, clustering, evaluations, metrics
├── sdk.py                       # completion(), acompletion(), Router class
├── clustering/
│   ├── pipeline.py              # Extract → embed → cluster → label → store
│   ├── labeler.py               # LLM-powered cluster labeling via harness
│   └── quality.py               # Coherence, diversity, noise quality gates
├── harness/
│   ├── runner.py                # Agent executor (JSON parsing, retry, tools)
│   ├── tools.py                 # Agent tools (query traces, datasets, etc.)
│   ├── memory_store.py          # Persistent agent memory
│   └── agents/                  # 7 agent configs (.md files)
│       ├── cluster_labeler.md
│       ├── coherence_scorer.md
│       ├── outlier_detector.md
│       ├── merge_checker.md
│       ├── trace_scanner.md
│       ├── eval_generator.md
│       └── metrics_suggester.md
├── distillation/
│   ├── pipeline.py              # 4-phase orchestrator (data gen → curation → train → export)
│   ├── data_gen.py              # Teacher model candidate generation
│   ├── curation.py              # LLM-as-Judge scoring & selection
│   ├── trainer.py               # SFT/BOND fine-tuning (Unsloth + LoRA)
│   ├── export.py                # LoRA merge + GGUF conversion
│   ├── repository.py            # ClickHouse persistence
│   ├── router.py                # API endpoints
│   └── schemas.py               # Pydantic models & model catalog
├── evaluations/                 # Evaluation runs & results
├── datasets/                    # Dataset CRUD, from-traces, auto-collect
├── metrics/                     # Metric definitions & validation
├── experiments/                 # A/B experiments & comparison
├── annotations/                 # Human annotation queues
├── auto_eval/                   # Automated evaluation configs & triggers
├── eval_agent/                  # AI-powered eval setup assistant
├── proposals/                   # Decision engine proposals
├── trace_issues/                # Issue scanning & detection
├── training/                    # Custom router training (UniRoute)
├── storage/
│   ├── clickhouse_client.py     # Analytics queries
│   ├── secrets.py               # API key management
│   └── state_manager.py         # File-based state persistence
├── model_prices.py              # 70+ models with pricing
└── mcp/                         # Claude Code MCP server

ui/                              # React dashboard
├── src/features/
│   ├── traces/                  # Trace explorer with drawer, filters, timeline
│   ├── evaluations/             # Run evaluations, metrics, experiments
│   └── distill-dataset/         # Dataset management, clustering, export

Evaluation Workflow

# 1. Send traffic through the gateway (traces auto-captured)
curl http://localhost:8080/v1/chat/completions \
  -d '{"model": "openai/gpt-4o-mini", "messages": [{"role": "user", "content": "Explain closures in JS"}]}'

# 2. Run clustering to group prompts by domain
curl -X POST http://localhost:8000/v1/clustering/run?days=30&min_traces=5

# 3. AI suggests metrics for a domain dataset
curl -X POST http://localhost:8000/v1/auto-eval/suggest-metrics \
  -d '{"dataset_id": "cluster:run-id:2"}'

# 4. Run evaluation comparing models on that dataset
curl -X POST http://localhost:8000/v1/evaluations \
  -d '{"name": "JS eval", "dataset_id": "cluster:run-id:2",
       "models": ["openai/gpt-4o-mini", "mistral/mistral-small-latest"],
       "metrics": ["similarity", "latency", "cost", "llm_judge"]}'

# 5. Check results
curl http://localhost:8000/v1/evaluations/{id}/results

Distillation

BOND-style distillation pipeline: generate candidates with a teacher model, score them with LLM-as-Judge, fine-tune a student model with LoRA, and export to GGUF.

make install-train   # install training deps (requires CUDA)

Via UI at http://localhost:3000 → Distillation, or via API:

curl -X POST http://localhost:8000/v1/distillation/default/jobs \
  -H "Content-Type: application/json" \
  -d '{
    "teacher_model": "openai/gpt-4o-mini",
    "student_model": "unsloth/Llama-3.2-1B-Instruct-bnb-4bit",
    "num_candidates": 5,
    "dataset_id": "my-dataset"
  }'

Semantic Routing

With pre-trained weights, the router picks the best model per prompt:

make download-weights   # download from HuggingFace
make start-router       # start with semantic routing enabled
router = load_router()
decision = router.route("Explain quantum computing")
print(f"Best model: {decision.selected_model}")
print(f"Expected error: {decision.expected_error:.4f}")

Training Custom Routers

from opentracy import full_training_pipeline, TrainingConfig, PromptDataset, create_client

train_data = PromptDataset.load("train.json")
val_data = PromptDataset.load("val.json")

clients = [
    create_client("openai", "gpt-4o"),
    create_client("openai", "gpt-4o-mini"),
    create_client("groq", "llama-3.1-8b-instant"),
]

result = full_training_pipeline(
    train_data, val_data, clients,
    TrainingConfig(num_clusters=100, output_dir="./weights"),
)

MCP Integration (Claude Code)

pip install opentracy[mcp]

Add to ~/.claude/settings.json:

{
  "mcpServers": {
    "opentracy": {
      "command": "python",
      "args": ["-m", "opentracy.mcp"]
    }
  }
}

Tools: opentracy_route, opentracy_generate, opentracy_smart_generate, opentracy_list_models, opentracy_compare.

Development

make help               # show all commands
make install            # install Python SDK + Go deps
make install-all        # install everything (Python + Go + UI)
make install-train      # install training/distillation deps (CUDA)
make dev-all            # start full local stack (ClickHouse + Go + API + UI)
make stop-all           # stop all local services
make test               # run all tests
make lint               # lint all code

License

MIT License - see LICENSE file for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

opentracy-0.4.0.tar.gz (662.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

opentracy-0.4.0-py3-none-win_amd64.whl (93.7 MB view details)

Uploaded Python 3Windows x86-64

opentracy-0.4.0-py3-none-manylinux_2_34_x86_64.whl (96.3 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ x86-64

opentracy-0.4.0-py3-none-manylinux_2_34_aarch64.whl (94.9 MB view details)

Uploaded Python 3manylinux: glibc 2.34+ ARM64

opentracy-0.4.0-py3-none-macosx_11_0_arm64.whl (97.7 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file opentracy-0.4.0.tar.gz.

File metadata

  • Download URL: opentracy-0.4.0.tar.gz
  • Upload date:
  • Size: 662.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opentracy-0.4.0.tar.gz
Algorithm Hash digest
SHA256 07f9196f3897cd53b4b5c67fae9e166a77f2277e983c1c30b3d13a50a24cf36a
MD5 b9153c1e1044cb435f174ddeb171bb0d
BLAKE2b-256 53c87ee96675c8906207ef170e6d1689c697243f50cf0083d452fcc296afdcc9

See more details on using hashes here.

Provenance

The following attestation bundles were made for opentracy-0.4.0.tar.gz:

Publisher: release.yml on OpenTracy/OpenTracy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opentracy-0.4.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: opentracy-0.4.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 93.7 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for opentracy-0.4.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 e84089a8052746ab96e27cf857b14d5fe4331a2a9200e976baa85bf22b791d51
MD5 0c8ab1b0da36030b8b4a83e60e44be6b
BLAKE2b-256 8c5124b09014573898fb8fcebcefd203bc90744f983e1d93c0c7a5da667f999d

See more details on using hashes here.

Provenance

The following attestation bundles were made for opentracy-0.4.0-py3-none-win_amd64.whl:

Publisher: release.yml on OpenTracy/OpenTracy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opentracy-0.4.0-py3-none-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for opentracy-0.4.0-py3-none-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ec2d59e9a1c9e179a647853d7aae3d00997328098c91d8a549d300b27415a70e
MD5 9ed5b9ffd202eab131048e891ea6d322
BLAKE2b-256 c539fbd26eb3fadedeaa4549d8e085b243e36f0b813b51c7553249787bc0c9cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for opentracy-0.4.0-py3-none-manylinux_2_34_x86_64.whl:

Publisher: release.yml on OpenTracy/OpenTracy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opentracy-0.4.0-py3-none-manylinux_2_34_aarch64.whl.

File metadata

File hashes

Hashes for opentracy-0.4.0-py3-none-manylinux_2_34_aarch64.whl
Algorithm Hash digest
SHA256 2e554dce596814baf5030665c8b49965e0a6cc4543b5cf6f1fe33e5e881dbb65
MD5 77a68127c8d530ec77e43ff6b5fd1773
BLAKE2b-256 3a8e60f1c716556c68ba2b41afb5e12293305902371a7fa690adcac9bea7419c

See more details on using hashes here.

Provenance

The following attestation bundles were made for opentracy-0.4.0-py3-none-manylinux_2_34_aarch64.whl:

Publisher: release.yml on OpenTracy/OpenTracy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file opentracy-0.4.0-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for opentracy-0.4.0-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 30f4291dae30e3ff7ec60dd051c8a442704c0febba51ade9fd727724b890c365
MD5 af901b406e4a16787cdd671d6834626d
BLAKE2b-256 211bcedf0ea36a8edeca389af6808df0d161bc3fd00f43e85223083d0b054b81

See more details on using hashes here.

Provenance

The following attestation bundles were made for opentracy-0.4.0-py3-none-macosx_11_0_arm64.whl:

Publisher: release.yml on OpenTracy/OpenTracy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page