Skip to main content

Equitas: AI Safety & Observability Platform

Project description

Equitas: AI Safety & Observability Platform

A hybrid SDK and backend platform that enhances OpenAI API usage with real-time safety, bias, and compliance checks.

Overview

Equitas provides:

  • Client SDK: Drop-in replacement for OpenAI API with safety enhancements
  • Guardian Backend: Microservices for toxicity, bias, and jailbreak detection
  • Real-time Dashboard: Observability UI for metrics and incidents
  • Multi-tenant: Enterprise-grade data isolation and RBAC

Architecture

┌─────────────────┐
│  Your App       │
│  + Equitas SDK  │
└────────┬────────┘
         │
         └──────────────► Guardian Backend
                          ├── Toxicity Detector
                          ├── Bias Checker
                          ├── Jailbreak Detector
                          ├── Explainability Engine
                          └── Remediation Service
                          
                          ↓
                          
                     Database (Logs, Incidents, Metrics)
                     
                          ↓
                          
                     Dashboard UI

Quick Start

1. Install Dependencies

cd backend
uv init --python 3.11
uv venv
source .venv/bin/activate
uv pip install -e .

2. Configure Environment

Create .env file:

# OpenAI
OPENAI_API_KEY=sk-your-key-here

# Database
DATABASE_URL=sqlite+aiosqlite:///.equitas.db

# Security
SECRET_KEY=your-secret-key-change-in-production

3. Start Guardian Backend

cd backend
python -m guardian.main

Backend will be available at http://localhost:8000

4. Use Equitas SDK

from fairsight_sdk import FairSight, SafetyConfig

# Initialize client
client = FairSight(
    openai_api_key="sk-...",
    fairsight_api_key="fs-dev-key-123",
    tenant_id="your-org",
)

# Make safe API calls
response = client.chat.completions.create(
    model="gpt-4",
    messages=[{"role": "user", "content": "Hello!"}],
    safety_config=SafetyConfig(on_flag="auto-correct")
)

# Access safety metadata
print(f"Toxicity: {response.safety_scores.toxicity_score}")
print(f"Categories: {response.safety_scores.toxicity_categories}")

Project Structure

backend/
├── fairsight_sdk/          # Client SDK
│   ├── client.py           # Main SDK client
│   ├── models.py           # Data models
│   └── exceptions.py       # Custom exceptions
│
├── guardian/               # Backend API
│   ├── main.py            # FastAPI app
│   ├── core/              # Core utilities
│   │   ├── config.py      # Configuration
│   │   ├── database.py    # Database setup
│   │   └── auth.py        # Authentication
│   ├── models/            # Database models
│   │   ├── database.py    # SQLAlchemy models
│   │   └── schemas.py     # Pydantic schemas
│   ├── services/          # Analysis services
│   │   ├── toxicity.py    # Toxicity detection
│   │   ├── bias.py        # Bias checking
│   │   ├── jailbreak.py   # Jailbreak detection
│   │   ├── explainability.py  # Explanations
│   │   └── remediation.py     # Content remediation
│   └── api/v1/            # API endpoints
│       ├── analysis.py    # Analysis endpoints
│       ├── logging.py     # Logging endpoint
│       ├── metrics.py     # Metrics endpoint
│       └── incidents.py   # Incidents endpoint
│
└── examples/              # Usage examples
    ├── basic_usage.py     # SDK examples
    └── test_guardian_api.py  # API testing

Safety Features

Toxicity Detection

  • Uses OpenAI Moderation API
  • Detects hate, harassment, violence, self-harm, sexual content
  • Returns toxicity score (0-1) and flagged categories

Bias Detection

  • Demographic bias checking
  • Paired prompt testing
  • Stereotype detection

Jailbreak Detection

  • Pattern-based prompt injection detection
  • Instruction override attempts
  • Code injection prevention

Explainability

  • Highlights problematic text spans
  • Natural language explanations
  • Detailed violation categorization

Automatic Remediation

  • LLM-based text rewriting
  • Removes toxic language while preserving intent
  • Neutralizes biased content

API Endpoints

Analysis Endpoints

POST /v1/analysis/toxicity

Analyze text for toxicity.

{
  "text": "Text to analyze",
  "tenant_id": "org123"
}

POST /v1/analysis/bias

Check for demographic bias.

{
  "prompt": "Original prompt",
  "response": "LLM response",
  "tenant_id": "org123"
}

POST /v1/analysis/jailbreak

Detect jailbreak attempts.

{
  "text": "Text to check",
  "tenant_id": "org123"
}

POST /v1/analysis/explain

Get explanation for flagged content.

{
  "text": "Flagged text",
  "issues": ["toxicity", "bias"],
  "tenant_id": "org123"
}

POST /v1/analysis/remediate

Remediate unsafe content.

{
  "text": "Unsafe text",
  "issue": "toxicity",
  "tenant_id": "org123"
}

Logging & Metrics

POST /v1/log

Log API call with safety analysis.

GET /v1/metrics

Get aggregated metrics (usage, safety scores, incidents).

GET /v1/incidents

Query flagged incidents with filters.

Authentication

All endpoints require:

  • Authorization Header: Bearer <api-key>
  • X-Tenant-ID Header: <tenant-id>

Default API keys (for development):

  • fs-dev-key-123tenant_demo
  • fs-prod-key-456tenant_prod

Metrics & Observability

Equitas logs comprehensive metrics per API call:

  • Safety Scores: Toxicity, bias, jailbreak flags
  • Performance: Latency, overhead, token counts
  • Usage: Safety Inference Units (SIUs) consumed
  • Incidents: Flagged content with severity levels

All data is isolated per tenant with encryption at rest.

Configuration

Safety Config (SDK)

SafetyConfig(
    on_flag="auto-correct",  # strict | auto-correct | warn-only
    toxicity_threshold=0.7,
    enable_bias_check=True,
    enable_jailbreak_check=True,
    enable_remediation=True,
)

Tenant Config (Backend)

Stored in database per tenant:

  • Safety thresholds
  • Feature flags (enable/disable checks)
  • Privacy settings (anonymization, retention)
  • Credit limits (Safety Units)

Testing

Run example scripts:

# Test SDK
python examples/basic_usage.py

# Test API directly
python examples/test_guardian_api.py

Development

Running locally

# Start backend
uvicorn guardian.main:app --reload --port 8000

# In another terminal, test SDK
python examples/basic_usage.py

Database migrations

# Auto-generate migration
alembic revision --autogenerate -m "Description"

# Apply migration
alembic upgrade head

Deployment

Docker

# Build
docker build -t equitas-guardian .

# Run
docker run -p 8000:8000 --env-file .env equitas-guardian

Kubernetes

kubectl apply -f k8s/deployment.yaml

License

MIT License - see LICENSE file

Contributing

Contributions welcome! Please see CONTRIBUTING.md

Documentation

For detailed documentation, see:

Support

For issues or questions:


Built for AI Safety

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

equitas-0.1.3.tar.gz (177.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

equitas-0.1.3-py3-none-any.whl (39.2 kB view details)

Uploaded Python 3

File details

Details for the file equitas-0.1.3.tar.gz.

File metadata

  • Download URL: equitas-0.1.3.tar.gz
  • Upload date:
  • Size: 177.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.3.tar.gz
Algorithm Hash digest
SHA256 13fb1f6a7ceb705d27c256469262e331d3c2bb3e31d633a04a16ddac23898f9e
MD5 395faf3f019bd32ea1481177e74903f2
BLAKE2b-256 f16e387a4e4364128738d341e93d79144fd776dbd8d08cc45c0d712e60c6a3ca

See more details on using hashes here.

File details

Details for the file equitas-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: equitas-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 39.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for equitas-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 27bf92c762426cd6d2d1dc674ef3a1ef3e48a2949963ec2244a9d7851e6777a6
MD5 3e7f5c63e7088c4e8689f772064a7b4e
BLAKE2b-256 2c2f719e0d44d780dc69e0bbedfa6bacfcfdc0a62fe561df28eb0b37767bf509

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page