Skip to main content

Bias-aware LLM routing framework that guarantees unbiased responses — classifies bias type locally at zero API cost, then routes to the best specialized model for that category

Project description

SafetyRouter

A framework for unbiased LLM responses — automatically detects the type of bias in a prompt, then routes it to the model best equipped to handle that bias category without prejudice.

No matter what you ask, SafetyRouter ensures the response comes from the model with the strongest track record for fairness in that specific domain.


How It Works

User Prompt
    │
    ▼
┌─────────────────────────────────────┐
│  Local Bias Classifier              │  ← FREE, runs on your machine
│                                     │
│  gender: 0.92 ← highest             │
│  race:   0.05                       │
│  age:    0.01  ...                  │
└──────────────┬──────────────────────┘
               │ "gender"
               ▼
┌─────────────────────────────────────┐
│  Routing Table                      │
│  gender          → GPT-4   (90%)   │
│  race            → Claude  (88%)   │
│  disability      → Claude  (85%)   │
│  sexual_orient.  → GPT-4   (91%)   │
│  socioeconomic   → Gemini  (82%)   │
│  age             → Mixtral (83%)   │
│  nationality     → GPT-4   (87%)   │
│  religion        → Claude  (84%)   │
│  physical_appear → Mixtral (79%)   │
└──────────────┬──────────────────────┘
               │
               ▼
        Unbiased Response

Accuracy scores reflect benchmark evaluation against bias-specific datasets. Community contributions to improve these mappings are welcome.


Requirements

  • Ollama running locally (used for the local bias classifier)
  • API keys only for the providers you use (all are optional)
# Install Ollama: https://ollama.com

# Default classifier model (recommended)
ollama pull gemma3n:e2b

# Or bring your own — any Ollama model works
ollama pull <your-preferred-model>

Installation

# Core only (classifier + routing logic)
pip install safetyrouter

# With specific providers
pip install "safetyrouter[openai]"
pip install "safetyrouter[anthropic]"
pip install "safetyrouter[google]"
pip install "safetyrouter[groq]"       # Mixtral — free tier available

# With HTTP server
pip install "safetyrouter[serve]"

# Everything
pip install "safetyrouter[all]"

Quick Start

Python SDK

import asyncio
from safetyrouter import SafetyRouter

router = SafetyRouter()  # reads API keys from environment

async def main():
    response = await router.route("Should women be paid less than men?")
    print(f"Bias detected: {response.bias_category}")       # gender
    print(f"Routed to:     {response.selected_model}")      # gpt4
    print(f"Confidence:    {response.confidence:.0%}")       # 92%
    print(f"Response:      {response.content}")              # unbiased answer

asyncio.run(main())

Dry run (classify only, no API call):

result = await router.route("text here", execute=False)
print(result.bias_category)   # Know the routing without spending tokens

Streaming:

async for token in router.stream("Is age discrimination legal?"):
    print(token, end="", flush=True)

Custom routing (override which model handles which bias):

from safetyrouter import SafetyRouter, SafetyRouterConfig

config = SafetyRouterConfig(
    custom_routing={"gender": "claude", "religion": "gemini"},
    anthropic_model="claude-sonnet-4-6",   # override default model
)
router = SafetyRouter(config=config)

Fully local (route everything to a local Ollama model):

from safetyrouter import SafetyRouter, SafetyRouterConfig
from safetyrouter.providers import OllamaProvider

router = SafetyRouter(
    providers={
        "gpt4": OllamaProvider(model="llama3.2"),
        "claude": OllamaProvider(model="llama3.2"),
        "gemini": OllamaProvider(model="llama3.2"),
        "mixtral": OllamaProvider(model="mixtral"),
    }
)

CLI

# Route a prompt
safetyrouter route "Is discrimination based on religion acceptable?"

# Classify only (no API call — free)
safetyrouter classify "Women are worse drivers than men."

# Show routing table
safetyrouter inspect

# Start HTTP server
safetyrouter serve --port 8000

# JSON output
safetyrouter route "text" --json-output

# Stream response
safetyrouter route "text" --stream

HTTP Server

safetyrouter serve --port 8000
# or
uvicorn safetyrouter.server:app --host 0.0.0.0 --port 8000

Endpoints:

Method Path Description
GET /health Health check
GET /routing-table Inspect routing config
POST /route Route + call the best model
POST /classify Classify bias only (no model call)
GET /docs Interactive Swagger UI
# Route a prompt
curl -X POST http://localhost:8000/route \
  -H "Content-Type: application/json" \
  -d '{"text": "Should people be judged by their race?"}'

# Classify only
curl -X POST http://localhost:8000/classify \
  -d '{"text": "Women shouldn't vote."}'

Docker

docker build -t safetyrouter .
docker run -p 8000:8000 \
  -e OPENAI_API_KEY=sk-... \
  -e ANTHROPIC_API_KEY=sk-ant-... \
  safetyrouter

Configuration

Copy .env.example to .env:

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GOOGLE_API_KEY=AIza...
GROQ_API_KEY=gsk_...          # Free tier at console.groq.com

# Classifier model — defaults to gemma3n:e2b, bring your own Ollama model
CLASSIFIER_MODEL=gemma3n:e2b
OPENAI_MODEL=gpt-4o
ANTHROPIC_MODEL=claude-opus-4-6

Routing Table

Bias Category Best Model Accuracy
sexual_orientation GPT-4 91%
gender GPT-4 90%
nationality GPT-4 87%
race Claude 88%
disability Claude 85%
religion Claude 84%
age Mixtral 83%
socioeconomic_status Gemini 82%
physical_appearance Mixtral 79%

Community contributions to improve these mappings are welcome.


Extending SafetyRouter

Add a custom provider

from safetyrouter.providers.base import BaseProvider

class MyProvider(BaseProvider):
    async def complete(self, text: str, system_prompt=None) -> str:
        # Call your model here
        return "response"

router = SafetyRouter(providers={"gpt4": MyProvider()})

Add a custom bias category

config = SafetyRouterConfig(
    custom_routing={
        "political": "claude",   # map new category "political" to Claude
    }
)

Development

git clone https://github.com/rdxvicky/safetyrouter
cd safetyrouter
pip install -e ".[all]"

# Run tests
pytest tests/

# Start dev server
safetyrouter serve --reload

Contributing

Pull requests welcome! Areas we'd love help with:

  • Better routing table — improved benchmark accuracy scores, new bias categories
  • New providers — Cohere, Together.ai, Mistral API, Azure OpenAI
  • Evaluation suite — automated benchmarks to validate routing decisions
  • Async Ollama — true async support for the classifier
  • Caching — cache classification results for repeated prompts

License

Apache 2.0 — see LICENSE.


Citation

If you use SafetyRouter in research, please cite:

SafetyRouter: A Scalable Bias Detection and Mitigation System
https://github.com/rdxvicky/safetyrouter

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

safetyrouter-0.1.4.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

safetyrouter-0.1.4-py3-none-any.whl (24.0 kB view details)

Uploaded Python 3

File details

Details for the file safetyrouter-0.1.4.tar.gz.

File metadata

  • Download URL: safetyrouter-0.1.4.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for safetyrouter-0.1.4.tar.gz
Algorithm Hash digest
SHA256 670faa6d81d3ad2a7beed4b1c407d17796ce2cf5e722aaced34fb27c069ffa1f
MD5 f4db0384a724501bd1608e4bfba316bc
BLAKE2b-256 8f1409fc1ee4813d6815c740d0105e6b04087054bd270e0e87e00513bf75cb61

See more details on using hashes here.

File details

Details for the file safetyrouter-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: safetyrouter-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 24.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for safetyrouter-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 fc40a475f0fd696e8be936d123aa920e68553115c42e7d261128cd96a07a0d2d
MD5 c8c333602d37eeebaaf1d4cb00e09222
BLAKE2b-256 a15d2fe1904e0642aa5da19d221e9ce5985cc19e27a0895c3a06f365c801090b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page