Bias-aware LLM routing framework that guarantees unbiased responses — classifies bias type locally at zero API cost, then routes to the best specialized model for that category
Project description
SafetyRouter
A framework for unbiased LLM responses — automatically detects the type of bias in a prompt, then routes it to the model best equipped to handle that bias category without prejudice.
No matter what you ask, SafetyRouter ensures the response comes from the model with the strongest track record for fairness in that specific domain.
How It Works
User Prompt
│
▼
┌─────────────────────────────────────┐
│ Local Bias Classifier │ ← FREE, runs on your machine
│ │
│ gender: 0.92 ← highest │
│ race: 0.05 │
│ age: 0.01 ... │
└──────────────┬──────────────────────┘
│ "gender"
▼
┌─────────────────────────────────────┐
│ Routing Table │
│ gender → GPT-4 (90%) │
│ race → Claude (88%) │
│ disability → Claude (85%) │
│ sexual_orient. → GPT-4 (91%) │
│ socioeconomic → Gemini (82%) │
│ age → Mixtral (83%) │
│ nationality → GPT-4 (87%) │
│ religion → Claude (84%) │
│ physical_appear → Mixtral (79%) │
└──────────────┬──────────────────────┘
│
▼
Unbiased Response
Accuracy scores reflect benchmark evaluation against bias-specific datasets. Community contributions to improve these mappings are welcome.
Installation
pip install safetyrouter
safetyrouter setup
That's it. safetyrouter setup handles everything automatically:
- Installs Ollama if not present
- Starts the Ollama service
- Pulls the default classifier model (
gemma3n:e2b)
Bring your own model — run
safetyrouter setup --model <model-name>to use any Ollama model as the classifier.
Install with specific providers
pip install "safetyrouter[openai]" # GPT-4o
pip install "safetyrouter[anthropic]" # Claude
pip install "safetyrouter[google]" # Gemini
pip install "safetyrouter[groq]" # Mixtral — free tier available
pip install "safetyrouter[serve]" # HTTP server
pip install "safetyrouter[all]" # Everything
Quick Start
Python SDK
import asyncio
from safetyrouter import SafetyRouter
router = SafetyRouter() # reads API keys from environment
async def main():
response = await router.route("Should women be paid less than men?")
print(f"Bias detected: {response.bias_category}") # gender
print(f"Routed to: {response.selected_model}") # gpt4
print(f"Confidence: {response.confidence:.0%}") # 92%
print(f"Response: {response.content}") # unbiased answer
asyncio.run(main())
Dry run (classify only, no API call):
result = await router.route("text here", execute=False)
print(result.bias_category) # Know the routing without spending tokens
Streaming:
async for token in router.stream("Is age discrimination legal?"):
print(token, end="", flush=True)
Custom routing (override which model handles which bias):
from safetyrouter import SafetyRouter, SafetyRouterConfig
config = SafetyRouterConfig(
custom_routing={"gender": "claude", "religion": "gemini"},
anthropic_model="claude-sonnet-4-6", # override default model
)
router = SafetyRouter(config=config)
Fully local (route everything to a local Ollama model):
from safetyrouter import SafetyRouter, SafetyRouterConfig
from safetyrouter.providers import OllamaProvider
router = SafetyRouter(
providers={
"gpt4": OllamaProvider(model="llama3.2"),
"claude": OllamaProvider(model="llama3.2"),
"gemini": OllamaProvider(model="llama3.2"),
"mixtral": OllamaProvider(model="mixtral"),
}
)
CLI
# First-time setup (installs Ollama + pulls classifier model)
safetyrouter setup
# Route a prompt
safetyrouter route "Is discrimination based on religion acceptable?"
# Classify only (no API call — free)
safetyrouter classify "Women are worse drivers than men."
# Show routing table
safetyrouter inspect
# Start HTTP server
safetyrouter serve --port 8000
# JSON output
safetyrouter route "text" --json-output
# Stream response
safetyrouter route "text" --stream
HTTP Server
safetyrouter serve --port 8000
# or
uvicorn safetyrouter.server:app --host 0.0.0.0 --port 8000
Endpoints:
| Method | Path | Description |
|---|---|---|
GET |
/health |
Health check |
GET |
/routing-table |
Inspect routing config |
POST |
/route |
Route + call the best model |
POST |
/classify |
Classify bias only (no model call) |
GET |
/docs |
Interactive Swagger UI |
# Route a prompt
curl -X POST http://localhost:8000/route \
-H "Content-Type: application/json" \
-d '{"text": "Should people be judged by their race?"}'
# Classify only
curl -X POST http://localhost:8000/classify \
-d '{"text": "Women shouldn't vote."}'
Docker
docker build -t safetyrouter .
docker run -p 8000:8000 \
-e OPENAI_API_KEY=sk-... \
-e ANTHROPIC_API_KEY=sk-ant-... \
safetyrouter
Configuration
Copy .env.example to .env:
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_... # Free tier at console.groq.com
# Classifier model — defaults to gemma3n:e2b, bring your own Ollama model
CLASSIFIER_MODEL=gemma3n:e2b
OPENAI_MODEL=gpt-4o
ANTHROPIC_MODEL=claude-opus-4-6
Routing Table
| Bias Category | Best Model | Accuracy |
|---|---|---|
sexual_orientation |
GPT-4 | 91% |
gender |
GPT-4 | 90% |
nationality |
GPT-4 | 87% |
race |
Claude | 88% |
disability |
Claude | 85% |
religion |
Claude | 84% |
age |
Mixtral | 83% |
socioeconomic_status |
Gemini | 82% |
physical_appearance |
Mixtral | 79% |
Community contributions to improve these mappings are welcome.
Extending SafetyRouter
Add a custom provider
from safetyrouter.providers.base import BaseProvider
class MyProvider(BaseProvider):
async def complete(self, text: str, system_prompt=None) -> str:
# Call your model here
return "response"
router = SafetyRouter(providers={"gpt4": MyProvider()})
Add a custom bias category
config = SafetyRouterConfig(
custom_routing={
"political": "claude", # map new category "political" to Claude
}
)
Development
git clone https://github.com/rdxvicky/safetyrouter
cd safetyrouter
pip install -e ".[all]"
# Run tests
pytest tests/
# Start dev server
safetyrouter serve --reload
Contributing
Pull requests welcome! Areas we'd love help with:
- Better routing table — improved benchmark accuracy scores, new bias categories
- New providers — Cohere, Together.ai, Mistral API, Azure OpenAI
- Evaluation suite — automated benchmarks to validate routing decisions
- Async Ollama — true async support for the classifier
- Caching — cache classification results for repeated prompts
License
Apache 2.0 — see LICENSE.
Citation
If you use SafetyRouter in research, please cite:
SafetyRouter: A Scalable Bias Detection and Mitigation System
https://github.com/rdxvicky/safetyrouter
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file safetyrouter-0.1.5.tar.gz.
File metadata
- Download URL: safetyrouter-0.1.5.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b6712cd673d7bdba17ab91900e03ef75cc9e8ea573f6ef9958b02d370b83b88
|
|
| MD5 |
347383928fbfb3bb19aa11ff9d526f33
|
|
| BLAKE2b-256 |
cb7a0c275b2ffdb41da2701525f5bb5aeab95eb3c5c83d4d1f2ec05261a6fea1
|
File details
Details for the file safetyrouter-0.1.5-py3-none-any.whl.
File metadata
- Download URL: safetyrouter-0.1.5-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbf40b241fe26a6373fcb5b6a966b1f6c5054e0131bd1b4b58c48dc924610c9e
|
|
| MD5 |
80378dd2965eab41bdf479b5a14fd26e
|
|
| BLAKE2b-256 |
ffa7f79bc4c18ca46e449354a67391af0f42316267e91b731139c13d623a7bb7
|