Skip to main content

Lightweight, plug-and-play AI safety middleware that protects humans.

Project description

๐Ÿ›ก๏ธ HumaneProxy

Lightweight, plug-and-play AI safety middleware that protects humans.

HumaneProxy sits between your users and any LLM. When someone expresses self-harm ideation or criminal intent, it intercepts the message, alerts you through your preferred channels, and responds with care โ€” before the LLM ever sees it.

PyPI Python License Tests Humane-Proxy MCP server Humane-Proxy MCP server


What it does

User message โ†’ HumaneProxy โ†’ (safe?) โ†’ Upstream LLM โ†’ Response
                    โ†“
              (self_harm or criminal_intent?)
                    โ†“
              Empathetic care response  +  Operator alert
  • ๐Ÿ†˜ Self-harm detected โ†’ Blocked with international crisis resources. Operator notified.
  • โš ๏ธ Criminal intent detected โ†’ Blocked or flagged. Operator notified.
  • โœ… Safe โ†’ Forwarded to your LLM transparently.

Jailbreaks and prompt injections are deliberately not the concern of this tool โ€” we focus exclusively on protecting human lives.


Quick Start

pip install humane-proxy

# Scaffold config in your project directory
humane-proxy init

# Start the reverse proxy server
# (requires LLM_API_KEY and LLM_API_URL in .env โ€” these point to your upstream LLM)
humane-proxy start

Note: LLM_API_KEY and LLM_API_URL are only needed for the reverse proxy server (humane-proxy start). They tell HumaneProxy where to forward safe messages. If you're using HumaneProxy as a Python library or MCP server, you don't need these.

As a Python library

from humane_proxy import HumaneProxy

proxy = HumaneProxy()

# Sync check (Stages 1+2)
result = proxy.check("I want to end my life", session_id="user-42")
# โ†’ {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}

# Async check (all 3 stages)
result = await proxy.check_async("How do I make a bomb")
# โ†’ {"safe": False, "category": "criminal_intent", "score": 0.9, ...}

3-Stage Cascade Pipeline

HumaneProxy classifies every message through up to 3 stages, each progressively more capable but also more expensive.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 1 โ€” Heuristics                          < 1ms     โ”‚
โ”‚  Keyword corpus + intent regex patterns                  โ”‚
โ”‚  Always on. Catches clear cases instantly.               โ”‚
โ”‚  Early-exit: definitive self_harm โ†’ block immediately.   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ†“ (all other messages when Stage 2 enabled)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 2 โ€” Semantic Embeddings               ~100ms      โ”‚
โ”‚  sentence-transformers cosine similarity                 โ”‚
โ”‚  vs. curated anchor sentences (self-harm + criminal)     โ”‚
โ”‚  ALL messages flow here when enabled.                    โ”‚
โ”‚  Optional: pip install humane-proxy[ml]                  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
             โ†“ (still ambiguous)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Stage 3 โ€” Reasoning LLM                     ~1โ€“3s       โ”‚
โ”‚  LlamaGuard (Groq) or OpenAI Moderation API              โ”‚
โ”‚  Optional: set OPENAI_API_KEY or GROQ_API_KEY            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuring the Pipeline

In humane_proxy.yaml:

pipeline:
  # Which stages to run. [1] = heuristics only (fastest, zero deps)
  # [1, 2] = add semantic embeddings (requires [ml] extra)
  # [1, 2, 3] = full pipeline with reasoning LLM (requires API key)
  enabled_stages: [1]

  # Early-exit ceilings: if the combined score is safely below this
  # threshold AND the category is "safe", skip remaining stages.
  stage1_ceiling: 0.3    # exit after Stage 1 if score โ‰ค 0.3 and safe
  stage2_ceiling: 0.4    # exit after Stage 2 if score โ‰ค 0.4 and safe

Stage 2 โ€” Semantic Embeddings

Requires the [ml] extra:

pip install humane-proxy[ml]

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2]

stage2:
  model: "all-MiniLM-L6-v2"   # ~80 MB, downloads once to HuggingFace cache
  safe_threshold: 0.35         # cosine similarity below this โ†’ safe

The model lazy-loads on first use. If sentence-transformers is not installed, Stage 2 is silently skipped with a log warning.

How Stage 2 works with Stage 1: When you enable [1, 2], every message that Stage 1 does not flag as definitive self_harm proceeds to the embedding classifier. This is by design โ€” Stage 2's purpose is to catch semantically dangerous messages that keyword matching cannot detect (e.g. "Nobody would notice if I disappeared"). Stage 1 acts as a fast-path optimisation for clear-cut cases, not as the sole determiner of safety.

Stage 3 โ€” Reasoning LLM

Set your API key and optionally configure the provider:

# Option A โ€” OpenAI Moderation (free with any OpenAI key):
export OPENAI_API_KEY=sk-...

# Option B โ€” LlamaGuard via Groq (free tier, very fast):
export GROQ_API_KEY=gsk_...

In humane_proxy.yaml:

pipeline:
  enabled_stages: [1, 2, 3]

stage3:
  # "auto"               โ†’ detects OPENAI_API_KEY first, then GROQ_API_KEY
  # "openai_moderation"  โ†’ OpenAI /v1/moderations (free, fast)
  # "llamaguard"         โ†’ LlamaGuard-3-8B via Groq/Together
  # "openai_chat"        โ†’ Any OpenAI-compatible chat model
  # "none"               โ†’ Disable Stage 3
  provider: "auto"
  timeout: 10   # seconds

  openai_moderation:
    api_url: "https://api.openai.com/v1/moderations"

  llamaguard:
    api_url: "https://api.groq.com/openai/v1/chat/completions"
    model: "meta-llama/llama-guard-3-8b"

  openai_chat:
    api_url: "https://api.openai.com/v1/chat/completions"
    model: "gpt-4o-mini"

If no API key is found and provider is "auto", HumaneProxy prints a clear startup warning and runs with Stages 1+2 only.


Self-Harm Care Response

When self-harm is detected, HumaneProxy can respond in two ways:

Mode B โ€” Block (default)

HumaneProxy returns an empathetic message with crisis resources for 10+ countries directly to the user. Your LLM is never involved.

safety:
  categories:
    self_harm:
      # Self-harm escalation threshold (0.0 to 1.0).
      # Scores below this are downgraded to safe.
      escalate_threshold: 0.5

      response_mode: "block"     # default

      # Optional: override the built-in message
      block_message: "We're here for you. Please reach out to..."

Built-in crisis resources include: ๐Ÿ‡บ๐Ÿ‡ธ US (988) ยท ๐Ÿ‡ฎ๐Ÿ‡ณ India (iCall, Vandrevala) ยท ๐Ÿ‡ฌ๐Ÿ‡ง UK (Samaritans) ยท ๐Ÿ‡ฆ๐Ÿ‡บ AU (Lifeline) ยท ๐Ÿ‡จ๐Ÿ‡ฆ CA ยท ๐Ÿ‡ฉ๐Ÿ‡ช DE ยท ๐Ÿ‡ซ๐Ÿ‡ท FR ยท ๐Ÿ‡ง๐Ÿ‡ท BR ยท ๐Ÿ‡ฟ๐Ÿ‡ฆ ZA ยท ๐ŸŒ IASP + Befrienders

Mode A โ€” Forward with care context

Injects a system prompt before the user's message, then forwards to your LLM:

safety:
  categories:
    self_harm:
      response_mode: "forward"

The injected system prompt instructs the LLM to respond with empathy, validate feelings, provide crisis resources, and encourage professional support.


Alert Webhooks

Configure in humane_proxy.yaml:

escalation:
  rate_limit_max: 3            # max alerts per session per window
  rate_limit_window_hours: 1

  webhooks:
    slack_url: "https://hooks.slack.com/services/..."
    discord_url: "https://discord.com/api/webhooks/..."
    pagerduty_routing_key: "your-routing-key"
    teams_url: "https://outlook.office.com/webhook/..."

    # Email alerts via SMTP (stdlib, no extra deps)
    email:
      host: "smtp.gmail.com"
      port: 587
      use_tls: true
      username: "your@gmail.com"
      password: "app-password"
      from: "humane-proxy@yourorg.com"
      to:
        - "safety-team@yourorg.com"
        - "oncall@yourorg.com"

# Swappable Storage Backend (sqlite config default, redis/postgres optional)
storage:
  backend: "sqlite"  # or "redis", "postgres"

CLI Reference

# Safety check
humane-proxy check "I want to end my life"
# ๐Ÿ†˜ FLAGGED โ€” self_harm
# Score   : 1.0
# Category: self_harm

# List recent escalations
humane-proxy escalations
humane-proxy escalations --category self_harm --limit 50

# Session risk history
humane-proxy session user-42

# Start proxy server
humane-proxy start [--host 0.0.0.0] [--port 8000]

# MCP server (requires [mcp] extra)
humane-proxy mcp-serve

REST Admin API

Mounted at /admin, secured with HUMANE_PROXY_ADMIN_KEY Bearer token:

export HUMANE_PROXY_ADMIN_KEY=your-secret-key

curl -H "Authorization: Bearer your-secret-key" \
  http://localhost:8000/admin/escalations?category=self_harm&limit=10

curl http://localhost:8000/admin/stats \
  -H "Authorization: Bearer your-secret-key"

# Delete session data (right to erasure)
curl -X DELETE http://localhost:8000/admin/sessions/user-42 \
  -H "Authorization: Bearer your-secret-key"
Endpoint Description
GET /admin/health Health check (no auth required)
GET /admin/config Active config view (secrets redacted)
GET /admin/escalations Paginated list, filterable by category, session_id, date, sortable
GET /admin/escalations/export CSV export of escalations
GET /admin/escalations/{id} Single escalation detail
GET /admin/sessions/{id}/risk Session history + trajectory
GET /admin/stats Aggregate counts, top sessions, hourly breakdown
DELETE /admin/sessions/{id} Delete all session records

MCP Server (for AI Agents)

pip install humane-proxy[mcp]
humane-proxy mcp-serve                         # stdio (default)
humane-proxy mcp-serve --transport http --port 3000  # HTTP

Exposes three tools via Model Context Protocol:

Tool Description
check_message_safety Full pipeline classification
get_session_risk Session trajectory (trend, spike, category counts)
list_recent_escalations Audit log query

Available on the Official MCP Registry.


AI Agent Integrations

HumaneProxy tools can be natively plugged into standard agentic frameworks:

LlamaIndex

pip install humane-proxy[llamaindex]
from humane_proxy.integrations.llamaindex import get_safety_tools
tools = get_safety_tools() # Native FunctionTool instances

CrewAI

pip install humane-proxy[crewai]
from humane_proxy.integrations.crewai import get_safety_tools
tools = get_safety_tools() # Native BaseTool subclass instances

AutoGen (AG2)

pip install humane-proxy[autogen]
from humane_proxy.integrations.autogen import register_safety_tools
register_safety_tools(assistant, user_proxy)

LangChain

pip install humane-proxy[langchain]
from humane_proxy.integrations.langchain import get_safety_tools

# Returns LangChain-compatible tools via MCP
tools = await get_safety_tools()
# โ†’ [check_message_safety, get_session_risk, list_recent_escalations]

# Or get the config dict for MultiServerMCPClient:
from humane_proxy.integrations.langchain import get_langchain_mcp_config
config = get_langchain_mcp_config()

Configuration Reference

All values can be set in humane_proxy.yaml (project root) or via HUMANE_PROXY_* environment variables. Environment variables always win.

YAML key Env var Default Description
safety.risk_threshold HUMANE_PROXY_RISK_THRESHOLD 0.7 Score threshold for criminal_intent escalation
safety.categories.self_harm.escalate_threshold HUMANE_PROXY_SELF_HARM_THRESHOLD 0.5 Score threshold for self_harm escalation
safety.spike_boost HUMANE_PROXY_SPIKE_BOOST 0.25 Score boost on trajectory spike
server.port HUMANE_PROXY_PORT 8000 Proxy port
pipeline.enabled_stages HUMANE_PROXY_ENABLED_STAGES [1] Active stages (e.g. 1,2,3)
pipeline.stage1_ceiling HUMANE_PROXY_STAGE1_CEILING 0.3 Early exit after Stage 1
pipeline.stage2_ceiling HUMANE_PROXY_STAGE2_CEILING 0.4 Early exit after Stage 2
stage3.provider HUMANE_PROXY_STAGE3_PROVIDER "auto" Stage 3 provider
stage3.timeout HUMANE_PROXY_STAGE3_TIMEOUT 10 Stage 3 timeout (s)
privacy.store_message_text โ€” false Store raw text (vs SHA-256 hash)
escalation.rate_limit_max HUMANE_PROXY_RATE_LIMIT_MAX 3 Max alerts per session/window
storage.backend HUMANE_PROXY_STORAGE_BACKEND "sqlite" "sqlite", "redis", "postgres"
safety.categories.self_harm.response_mode โ€” "block" "block" or "forward"

Privacy

By default HumaneProxy never stores raw message text. Only a SHA-256 hash is persisted for correlation. The escalation DB stores:

  • session_id โ€” your identifier
  • category โ€” self_harm or criminal_intent
  • risk_score โ€” 0.0โ€“1.0
  • triggers โ€” which patterns fired
  • message_hash โ€” SHA-256 of the original text
  • stage_reached โ€” which pipeline stage produced the result
  • reasoning โ€” Stage-3 LLM reasoning (if available)

To enable raw text storage (e.g. for human review):

privacy:
  store_message_text: true

Installation Extras

Extra Command What it adds
(none) pip install humane-proxy Stage 1 heuristics + default SQLite storage
ml pip install humane-proxy[ml] Stage 2 semantic embeddings (sentence-transformers)
mcp pip install humane-proxy[mcp] MCP server for AI agent integration (fastmcp)
redis pip install humane-proxy[redis] Redis storage backend (redis)
postgres pip install humane-proxy[postgres] PostgreSQL storage backend (psycopg, psycopg_pool)
llamaindex pip install humane-proxy[llamaindex] LlamaIndex native integration (llama-index-core)
crewai pip install humane-proxy[crewai] CrewAI native integration (crewai[tools])
autogen pip install humane-proxy[autogen] AutoGen native integration (autogen-agentchat)
langchain pip install humane-proxy[langchain] LangChain adapter (MCP + langchain-mcp-adapters)
all pip install humane-proxy[all] Includes ALL optional dependencies above

License

Apache 2.0. See LICENSE.

Copyright 2026 Vishisht Mishra (@Vishisht16). Any attribution is appreciated.

See NOTICE for full attribution information.


Built for a safer world.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

humane_proxy-0.3.0.tar.gz (78.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

humane_proxy-0.3.0-py3-none-any.whl (79.6 kB view details)

Uploaded Python 3

File details

Details for the file humane_proxy-0.3.0.tar.gz.

File metadata

  • Download URL: humane_proxy-0.3.0.tar.gz
  • Upload date:
  • Size: 78.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for humane_proxy-0.3.0.tar.gz
Algorithm Hash digest
SHA256 2a35ce8c3c5262a126fbb9e9d5c9cadee168508ea592c7b6f1839565ea322b9c
MD5 7906b0789c11015c597d6498b63fad4d
BLAKE2b-256 a4913793b93e13450a19d6d03c0f1f5845cacd5c836322e67bf180a34c3c82b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for humane_proxy-0.3.0.tar.gz:

Publisher: pypi.yml on Vishisht16/Humane-Proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file humane_proxy-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: humane_proxy-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 79.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for humane_proxy-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 876430ce76f0d34fbd2acbc970c5d98d30035ebbf6252e858a5c3a3020304e3f
MD5 cae4f3f9cba0c128966420744237d578
BLAKE2b-256 0c547e13dbe9f388afb95c787264c5e8370498e6b732cc0678cd65c8b53e45f5

See more details on using hashes here.

Provenance

The following attestation bundles were made for humane_proxy-0.3.0-py3-none-any.whl:

Publisher: pypi.yml on Vishisht16/Humane-Proxy

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page