Skip to main content

Equitas: AI Safety & Observability Platform - SDK and Guardian Backend

Project description

Equitas: AI Safety & Observability Platform

A hybrid SDK and backend platform that enhances OpenAI API usage with real-time safety, bias, and compliance checks.

Overview

Equitas provides:

  • Client SDK: Drop-in replacement for OpenAI API with safety enhancements
  • Guardian Backend: Microservices for toxicity, bias, and jailbreak detection
  • Real-time Dashboard: Observability UI for metrics and incidents
  • Multi-tenant: Enterprise-grade data isolation and RBAC

Architecture

┌─────────────────┐
│  Your App       │
│  + Equitas SDK  │
└────────┬────────┘
         │
         └──────────────► Guardian Backend
                          ├── Toxicity Detector
                          ├── Bias Checker
                          ├── Jailbreak Detector
                          ├── Explainability Engine
                          └── Remediation Service
                          
                          ↓
                          
                     Database (Logs, Incidents, Metrics)
                     
                          ↓
                          
                     Dashboard UI

Quick Start

1. Install Dependencies

cd backend
uv init --python 3.11
uv venv
source .venv/bin/activate
uv pip install -e .

2. Configure Environment

Create .env file:

# OpenAI
OPENAI_API_KEY=sk-your-key-here

# Database
DATABASE_URL=sqlite+aiosqlite:///./equitas.db

# Security
SECRET_KEY=your-secret-key-change-in-production

3. Start Guardian Backend

cd backend
python -m guardian.main

Backend will be available at http://localhost:8000

4. Use Equitas SDK

from fairsight_sdk import FairSight, SafetyConfig

# Initialize client
client = FairSight(
    openai_api_key="sk-...",
    fairsight_api_key="fs-dev-key-123",
    tenant_id="your-org",
)

# Make safe API calls
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    safety_config=SafetyConfig(on_flag="auto-correct")
)

# Access safety metadata
print(f"Toxicity: {response.safety_scores.toxicity_score}")
print(f"Categories: {response.safety_scores.toxicity_categories}")

Project Structure

backend/
├── fairsight_sdk/          # Client SDK
│   ├── client.py           # Main SDK client
│   ├── models.py           # Data models
│   └── exceptions.py       # Custom exceptions
│
├── guardian/               # Backend API
│   ├── main.py            # FastAPI app
│   ├── core/              # Core utilities
│   │   ├── config.py      # Configuration
│   │   ├── database.py    # Database setup
│   │   └── auth.py        # Authentication
│   ├── models/            # Database models
│   │   ├── database.py    # SQLAlchemy models
│   │   └── schemas.py     # Pydantic schemas
│   ├── services/          # Analysis services
│   │   ├── toxicity.py    # Toxicity detection
│   │   ├── bias.py        # Bias checking
│   │   ├── jailbreak.py   # Jailbreak detection
│   │   ├── explainability.py  # Explanations
│   │   └── remediation.py     # Content remediation
│   └── api/v1/            # API endpoints
│       ├── analysis.py    # Analysis endpoints
│       ├── logging.py     # Logging endpoint
│       ├── metrics.py     # Metrics endpoint
│       └── incidents.py   # Incidents endpoint
│
└── examples/              # Usage examples
    ├── basic_usage.py     # SDK examples
    └── test_guardian_api.py  # API testing

Safety Features

Toxicity Detection

  • Uses OpenAI Moderation API
  • Detects hate, harassment, violence, self-harm, sexual content
  • Returns toxicity score (0-1) and flagged categories

Bias Detection

  • Demographic bias checking
  • Paired prompt testing
  • Stereotype detection

Jailbreak Detection

  • Pattern-based prompt injection detection
  • Instruction override attempts
  • Code injection prevention

Explainability

  • Highlights problematic text spans
  • Natural language explanations
  • Detailed violation categorization

Automatic Remediation

  • LLM-based text rewriting
  • Removes toxic language while preserving intent
  • Neutralizes biased content

API Endpoints

Analysis Endpoints

POST /v1/analysis/toxicity

Analyze text for toxicity.

{
  "text": "Text to analyze",
  "tenant_id": "org123"
}

POST /v1/analysis/bias

Check for demographic bias.

{
  "prompt": "Original prompt",
  "response": "LLM response",
  "tenant_id": "org123"
}

POST /v1/analysis/jailbreak

Detect jailbreak attempts.

{
  "text": "Text to check",
  "tenant_id": "org123"
}

POST /v1/analysis/explain

Get explanation for flagged content.

{
  "text": "Flagged text",
  "issues": ["toxicity", "bias"],
  "tenant_id": "org123"
}

POST /v1/analysis/remediate

Remediate unsafe content.

{
  "text": "Unsafe text",
  "issue": "toxicity",
  "tenant_id": "org123"
}

Logging & Metrics

POST /v1/log

Log API call with safety analysis.

GET /v1/metrics

Get aggregated metrics (usage, safety scores, incidents).

GET /v1/incidents

Query flagged incidents with filters.

Authentication

All endpoints require:

  • Authorization Header: Bearer <api-key>
  • X-Tenant-ID Header: <tenant-id>

Default API keys (for development):

  • fs-dev-key-123tenant_demo
  • fs-prod-key-456tenant_prod

Metrics & Observability

Equitas logs comprehensive metrics per API call:

  • Safety Scores: Toxicity, bias, jailbreak flags
  • Performance: Latency, overhead, token counts
  • Usage: Safety Inference Units (SIUs) consumed
  • Incidents: Flagged content with severity levels

All data is isolated per tenant with encryption at rest.

Configuration

Safety Config (SDK)

SafetyConfig(
    on_flag="auto-correct",  # strict | auto-correct | warn-only
    toxicity_threshold=0.7,
    enable_bias_check=True,
    enable_jailbreak_check=True,
    enable_remediation=True,
)

Tenant Config (Backend)

Stored in database per tenant:

  • Safety thresholds
  • Feature flags (enable/disable checks)
  • Privacy settings (anonymization, retention)
  • Credit limits (Safety Units)

Testing

Run example scripts:

# Test SDK
python examples/basic_usage.py

# Test API directly
python examples/test_guardian_api.py

Development

Running locally

# Start backend
uvicorn guardian.main:app --reload --port 8000

# In another terminal, test SDK
python examples/basic_usage.py

Database migrations

# Auto-generate migration
alembic revision --autogenerate -m "Description"

# Apply migration
alembic upgrade head

Deployment

Docker

# Build
docker build -t equitas-guardian .

# Run
docker run -p 8000:8000 --env-file .env equitas-guardian

Kubernetes

kubectl apply -f k8s/deployment.yaml

License

MIT License - see LICENSE file

Contributing

Contributions welcome! Please see CONTRIBUTING.md

Documentation

For detailed documentation, see:

Support

For issues or questions:


Built for AI Safety

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

equitas-0.1.1.tar.gz (193.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

equitas-0.1.1-py3-none-any.whl (33.8 kB view details)

Uploaded Python 3

File details

Details for the file equitas-0.1.1.tar.gz.

File metadata

  • Download URL: equitas-0.1.1.tar.gz
  • Upload date:
  • Size: 193.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.1.tar.gz
Algorithm Hash digest
SHA256 da2a56585602ee9e04cac66e1ff160b51b08dca24f715c98390304f4e89ff70a
MD5 6ac26040fe53ef074673a8e7268e717d
BLAKE2b-256 ce20367563097f11633d7ced72f8c5c84ff9a2d6cc3b890c7925509b014b21d6

See more details on using hashes here.

File details

Details for the file equitas-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: equitas-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 33.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d620ad5fbe80837b4c9b4cba229dd3eb611f36c3575bd9e371787205d073e57
MD5 897f1f07365d10c6842f3c465864d5b1
BLAKE2b-256 e66c85a2c2ce7bda12e9e8e9d56ea814ebbf128ae56dffaab365465896464b05

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page