Skip to main content

Equitas: AI Safety & Observability Platform - SDK and Guardian Backend

Project description

Equitas: AI Safety & Observability Platform

A hybrid SDK and backend platform that enhances OpenAI API usage with real-time safety, bias, and compliance checks.

Overview

Equitas provides:

  • Client SDK: Drop-in replacement for OpenAI API with safety enhancements
  • Guardian Backend: Microservices for toxicity, bias, and jailbreak detection
  • Real-time Dashboard: Observability UI for metrics and incidents
  • Multi-tenant: Enterprise-grade data isolation and RBAC

Architecture

┌─────────────────┐
│  Your App       │
│  + Equitas SDK  │
└────────┬────────┘
         │
         └──────────────► Guardian Backend
                          ├── Toxicity Detector
                          ├── Bias Checker
                          ├── Jailbreak Detector
                          ├── Explainability Engine
                          └── Remediation Service
                          
                          ↓
                          
                     Database (Logs, Incidents, Metrics)
                     
                          ↓
                          
                     Dashboard UI

Quick Start

1. Install Dependencies

cd backend
uv init --python 3.11
uv venv
source .venv/bin/activate
uv pip install -e .

2. Configure Environment

Create .env file:

# OpenAI
OPENAI_API_KEY=sk-your-key-here

# Database
DATABASE_URL=sqlite+aiosqlite:///.equitas.db

# Security
SECRET_KEY=your-secret-key-change-in-production

3. Start Guardian Backend

cd backend
python -m guardian.main

Backend will be available at http://localhost:8000

4. Use Equitas SDK

from fairsight_sdk import FairSight, SafetyConfig

# Initialize client
client = FairSight(
    openai_api_key="sk-...",
    fairsight_api_key="fs-dev-key-123",
    tenant_id="your-org",
)

# Make safe API calls
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    safety_config=SafetyConfig(on_flag="auto-correct")
)

# Access safety metadata
print(f"Toxicity: {response.safety_scores.toxicity_score}")
print(f"Categories: {response.safety_scores.toxicity_categories}")

Project Structure

backend/
├── fairsight_sdk/          # Client SDK
│   ├── client.py           # Main SDK client
│   ├── models.py           # Data models
│   └── exceptions.py       # Custom exceptions
│
├── guardian/               # Backend API
│   ├── main.py            # FastAPI app
│   ├── core/              # Core utilities
│   │   ├── config.py      # Configuration
│   │   ├── database.py    # Database setup
│   │   └── auth.py        # Authentication
│   ├── models/            # Database models
│   │   ├── database.py    # SQLAlchemy models
│   │   └── schemas.py     # Pydantic schemas
│   ├── services/          # Analysis services
│   │   ├── toxicity.py    # Toxicity detection
│   │   ├── bias.py        # Bias checking
│   │   ├── jailbreak.py   # Jailbreak detection
│   │   ├── explainability.py  # Explanations
│   │   └── remediation.py     # Content remediation
│   └── api/v1/            # API endpoints
│       ├── analysis.py    # Analysis endpoints
│       ├── logging.py     # Logging endpoint
│       ├── metrics.py     # Metrics endpoint
│       └── incidents.py   # Incidents endpoint
│
└── examples/              # Usage examples
    ├── basic_usage.py     # SDK examples
    └── test_guardian_api.py  # API testing

Safety Features

Toxicity Detection

  • Uses OpenAI Moderation API
  • Detects hate, harassment, violence, self-harm, sexual content
  • Returns toxicity score (0-1) and flagged categories

Bias Detection

  • Demographic bias checking
  • Paired prompt testing
  • Stereotype detection

Jailbreak Detection

  • Pattern-based prompt injection detection
  • Instruction override attempts
  • Code injection prevention

Explainability

  • Highlights problematic text spans
  • Natural language explanations
  • Detailed violation categorization

Automatic Remediation

  • LLM-based text rewriting
  • Removes toxic language while preserving intent
  • Neutralizes biased content

API Endpoints

Analysis Endpoints

POST /v1/analysis/toxicity

Analyze text for toxicity.

{
  "text": "Text to analyze",
  "tenant_id": "org123"
}

POST /v1/analysis/bias

Check for demographic bias.

{
  "prompt": "Original prompt",
  "response": "LLM response",
  "tenant_id": "org123"
}

POST /v1/analysis/jailbreak

Detect jailbreak attempts.

{
  "text": "Text to check",
  "tenant_id": "org123"
}

POST /v1/analysis/explain

Get explanation for flagged content.

{
  "text": "Flagged text",
  "issues": ["toxicity", "bias"],
  "tenant_id": "org123"
}

POST /v1/analysis/remediate

Remediate unsafe content.

{
  "text": "Unsafe text",
  "issue": "toxicity",
  "tenant_id": "org123"
}

Logging & Metrics

POST /v1/log

Log API call with safety analysis.

GET /v1/metrics

Get aggregated metrics (usage, safety scores, incidents).

GET /v1/incidents

Query flagged incidents with filters.

Authentication

All endpoints require:

  • Authorization Header: Bearer <api-key>
  • X-Tenant-ID Header: <tenant-id>

Default API keys (for development):

  • fs-dev-key-123tenant_demo
  • fs-prod-key-456tenant_prod

Metrics & Observability

Equitas logs comprehensive metrics per API call:

  • Safety Scores: Toxicity, bias, jailbreak flags
  • Performance: Latency, overhead, token counts
  • Usage: Safety Inference Units (SIUs) consumed
  • Incidents: Flagged content with severity levels

All data is isolated per tenant with encryption at rest.

Configuration

Safety Config (SDK)

SafetyConfig(
    on_flag="auto-correct",  # strict | auto-correct | warn-only
    toxicity_threshold=0.7,
    enable_bias_check=True,
    enable_jailbreak_check=True,
    enable_remediation=True,
)

Tenant Config (Backend)

Stored in database per tenant:

  • Safety thresholds
  • Feature flags (enable/disable checks)
  • Privacy settings (anonymization, retention)
  • Credit limits (Safety Units)

Testing

Run example scripts:

# Test SDK
python examples/basic_usage.py

# Test API directly
python examples/test_guardian_api.py

Development

Running locally

# Start backend
uvicorn guardian.main:app --reload --port 8000

# In another terminal, test SDK
python examples/basic_usage.py

Database migrations

# Auto-generate migration
alembic revision --autogenerate -m "Description"

# Apply migration
alembic upgrade head

Deployment

Docker

# Build
docker build -t equitas-guardian .

# Run
docker run -p 8000:8000 --env-file .env equitas-guardian

Kubernetes

kubectl apply -f k8s/deployment.yaml

License

MIT License - see LICENSE file

Contributing

Contributions welcome! Please see CONTRIBUTING.md

Documentation

For detailed documentation, see:

Support

For issues or questions:


Built for AI Safety

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

equitas-0.1.2.tar.gz (177.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

equitas-0.1.2-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file equitas-0.1.2.tar.gz.

File metadata

  • Download URL: equitas-0.1.2.tar.gz
  • Upload date:
  • Size: 177.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.2.tar.gz
Algorithm Hash digest
SHA256 f1cf6b4823b40440a6cfc16fef6ee94ed4ab261bc184f5a7a8aaf5714952c3c2
MD5 41837ba619b28cf4dbe1db15e4b3eb6e
BLAKE2b-256 11db9534826ba8f8aa9f9f5fb4b8eea212d2fe6a5ac745e62ffbca8e748c0b5a

See more details on using hashes here.

File details

Details for the file equitas-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: equitas-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f55c9252c8f7cea437d3ccda1591379cb08429ea8a79bfa0f9a74465bf04e278
MD5 486016f2f7d8afe1f318ce90800a8a95
BLAKE2b-256 14a9bb456d91549b976726fae5f691a0f89abedf6122ced660f18ce0c71eb565

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page