Lightweight, plug-and-play AI safety middleware that protects humans.
Project description
๐ก๏ธ HumaneProxy
Lightweight, plug-and-play AI safety middleware that protects humans.
HumaneProxy sits between your users and any LLM. When someone expresses self-harm ideation or criminal intent, it intercepts the message, alerts you through your preferred channels, and responds with care โ before the LLM ever sees it.
What it does
User message โ HumaneProxy โ (safe?) โ Upstream LLM โ Response
โ
(self_harm or criminal_intent?)
โ
Empathetic care response + Operator alert
- ๐ Self-harm detected โ Blocked with international crisis resources. Operator notified.
- โ ๏ธ Criminal intent detected โ Blocked or flagged. Operator notified.
- โ Safe โ Forwarded to your LLM transparently.
Jailbreaks and prompt injections are deliberately not the concern of this tool โ we focus exclusively on protecting human lives.
Quick Start
pip install humane-proxy
# Scaffold config in your project directory
humane-proxy init
# Start the proxy (set LLM_API_KEY and LLM_API_URL in .env first)
humane-proxy start
As a Python library
from humane_proxy import HumaneProxy
proxy = HumaneProxy()
# Sync check (Stages 1+2)
result = proxy.check("I want to end my life", session_id="user-42")
# โ {"safe": False, "category": "self_harm", "score": 1.0, "triggers": [...]}
# Async check (all 3 stages)
result = await proxy.check_async("How do I make a bomb")
# โ {"safe": False, "category": "criminal_intent", "score": 0.9, ...}
3-Stage Cascade Pipeline
HumaneProxy classifies every message through up to 3 stages, each progressively more capable but also more expensive. Stages exit early when confident.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 1 โ Heuristics < 1ms โ
โ Keyword corpus + intent regex patterns โ
โ Always on. Catches clear cases instantly. โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (ambiguous or medium-score)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 2 โ Semantic Embeddings ~100ms โ
โ sentence-transformers cosine similarity โ
โ vs. curated anchor sentences (self-harm + criminal) โ
โ Optional: pip install humane-proxy[ml] โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ (still ambiguous)
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Stage 3 โ Reasoning LLM ~1โ3s โ
โ LlamaGuard (Groq) or OpenAI Moderation API โ
โ Optional: set OPENAI_API_KEY or GROQ_API_KEY โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Configuring the Pipeline
In humane_proxy.yaml:
pipeline:
# Which stages to run. [1] = heuristics only (fastest, zero deps)
# [1, 2] = add semantic embeddings (requires [ml] extra)
# [1, 2, 3] = full pipeline with reasoning LLM (requires API key)
enabled_stages: [1]
# Early-exit ceilings: if the combined score is safely below this
# threshold AND the category is "safe", skip remaining stages.
stage1_ceiling: 0.3 # exit after Stage 1 if score โค 0.3 and safe
stage2_ceiling: 0.4 # exit after Stage 2 if score โค 0.4 and safe
Stage 2 โ Semantic Embeddings
Requires the [ml] extra:
pip install humane-proxy[ml]
In humane_proxy.yaml:
pipeline:
enabled_stages: [1, 2]
stage2:
model: "all-MiniLM-L6-v2" # ~80 MB, downloads once to HuggingFace cache
safe_threshold: 0.35 # cosine similarity below this โ safe
The model lazy-loads on first use. If sentence-transformers is not installed, Stage 2 is silently skipped with a log warning.
Stage 3 โ Reasoning LLM
Set your API key and optionally configure the provider:
# Option A โ OpenAI Moderation (free with any OpenAI key):
export OPENAI_API_KEY=sk-...
# Option B โ LlamaGuard via Groq (free tier, very fast):
export GROQ_API_KEY=gsk_...
In humane_proxy.yaml:
pipeline:
enabled_stages: [1, 2, 3]
stage3:
# "auto" โ detects OPENAI_API_KEY first, then GROQ_API_KEY
# "openai_moderation" โ OpenAI /v1/moderations (free, fast)
# "llamaguard" โ LlamaGuard-3-8B via Groq/Together
# "openai_chat" โ Any OpenAI-compatible chat model
# "none" โ Disable Stage 3
provider: "auto"
timeout: 10 # seconds
openai_moderation:
api_url: "https://api.openai.com/v1/moderations"
llamaguard:
api_url: "https://api.groq.com/openai/v1/chat/completions"
model: "meta-llama/llama-guard-3-8b"
openai_chat:
api_url: "https://api.openai.com/v1/chat/completions"
model: "gpt-4o-mini"
If no API key is found and provider is "auto", HumaneProxy prints a clear startup warning and runs with Stages 1+2 only.
Self-Harm Care Response
When self-harm is detected, HumaneProxy can respond in two ways:
Mode B โ Block (default)
HumaneProxy returns an empathetic message with crisis resources for 10+ countries directly to the user. Your LLM is never involved.
safety:
categories:
self_harm:
response_mode: "block" # default
# Optional: override the built-in message
block_message: "We're here for you. Please reach out to..."
Built-in crisis resources include: ๐บ๐ธ US (988) ยท ๐ฎ๐ณ India (iCall, Vandrevala) ยท ๐ฌ๐ง UK (Samaritans) ยท ๐ฆ๐บ AU (Lifeline) ยท ๐จ๐ฆ CA ยท ๐ฉ๐ช DE ยท ๐ซ๐ท FR ยท ๐ง๐ท BR ยท ๐ฟ๐ฆ ZA ยท ๐ IASP + Befrienders
Mode A โ Forward with care context
Injects a system prompt before the user's message, then forwards to your LLM:
safety:
categories:
self_harm:
response_mode: "forward"
The injected system prompt instructs the LLM to respond with empathy, validate feelings, provide crisis resources, and encourage professional support.
Alert Webhooks
Configure in humane_proxy.yaml:
escalation:
rate_limit_max: 3 # max alerts per session per window
rate_limit_window_hours: 1
webhooks:
slack_url: "https://hooks.slack.com/services/..."
discord_url: "https://discord.com/api/webhooks/..."
pagerduty_routing_key: "your-routing-key"
teams_url: "https://outlook.office.com/webhook/..."
# Email alerts via SMTP (stdlib, no extra deps)
email:
host: "smtp.gmail.com"
port: 587
use_tls: true
username: "your@gmail.com"
password: "app-password"
from: "humane-proxy@yourorg.com"
to:
- "safety-team@yourorg.com"
- "oncall@yourorg.com"
CLI Reference
# Safety check
humane-proxy check "I want to end my life"
# ๐ FLAGGED โ self_harm
# Score : 1.0
# Category: self_harm
# List recent escalations
humane-proxy escalations
humane-proxy escalations --category self_harm --limit 50
# Session risk history
humane-proxy session user-42
# Start proxy server
humane-proxy start [--host 0.0.0.0] [--port 8000]
# MCP server (requires [mcp] extra)
humane-proxy mcp-serve
REST Admin API
Mounted at /admin, secured with HUMANE_PROXY_ADMIN_KEY Bearer token:
export HUMANE_PROXY_ADMIN_KEY=your-secret-key
curl -H "Authorization: Bearer your-secret-key" \
http://localhost:8000/admin/escalations?category=self_harm&limit=10
curl http://localhost:8000/admin/stats \
-H "Authorization: Bearer your-secret-key"
# Delete session data (right to erasure)
curl -X DELETE http://localhost:8000/admin/sessions/user-42 \
-H "Authorization: Bearer your-secret-key"
| Endpoint | Description |
|---|---|
GET /admin/escalations |
Paginated list, filterable by category, session_id |
GET /admin/escalations/{id} |
Single escalation detail |
GET /admin/sessions/{id}/risk |
Session history + trajectory |
GET /admin/stats |
Aggregate counts by category and day |
DELETE /admin/sessions/{id} |
Delete all session records |
MCP Server (for AI Agents)
pip install humane-proxy[mcp]
humane-proxy mcp-serve # stdio (default)
humane-proxy mcp-serve --transport http --port 3000 # HTTP
Exposes three tools via Model Context Protocol:
| Tool | Description |
|---|---|
check_message_safety |
Full pipeline classification |
get_session_risk |
Session trajectory (trend, spike, category counts) |
list_recent_escalations |
Audit log query |
Available on the Official MCP Registry.
LangChain Integration
Plug HumaneProxy safety tools into any LangChain or LangGraph agent:
pip install humane-proxy[langchain]
from humane_proxy.integrations.langchain import get_safety_tools
# Returns LangChain-compatible tools via MCP
tools = await get_safety_tools()
# โ [check_message_safety, get_session_risk, list_recent_escalations]
# Or get the config dict for MultiServerMCPClient:
from humane_proxy.integrations.langchain import get_langchain_mcp_config
config = get_langchain_mcp_config()
Configuration Reference
All values can be set in humane_proxy.yaml (project root) or via HUMANE_PROXY_* environment variables. Environment variables always win.
| YAML key | Env var | Default | Description |
|---|---|---|---|
safety.risk_threshold |
HUMANE_PROXY_RISK_THRESHOLD |
0.7 |
Score threshold for criminal_intent escalation |
safety.spike_boost |
โ | 0.25 |
Score boost on trajectory spike |
server.port |
HUMANE_PROXY_PORT |
8000 |
Proxy port |
pipeline.enabled_stages |
HUMANE_PROXY_ENABLED_STAGES |
[1] |
Active stages |
pipeline.stage1_ceiling |
HUMANE_PROXY_STAGE1_CEILING |
0.3 |
Early exit after Stage 1 |
pipeline.stage2_ceiling |
HUMANE_PROXY_STAGE2_CEILING |
0.4 |
Early exit after Stage 2 |
stage3.provider |
HUMANE_PROXY_STAGE3_PROVIDER |
"auto" |
Stage 3 provider |
stage3.timeout |
HUMANE_PROXY_STAGE3_TIMEOUT |
10 |
Stage 3 timeout (s) |
privacy.store_message_text |
โ | false |
Store raw text (vs SHA-256 hash) |
escalation.rate_limit_max |
โ | 3 |
Max alerts per session/window |
safety.categories.self_harm.response_mode |
โ | "block" |
"block" or "forward" |
Privacy
By default HumaneProxy never stores raw message text. Only a SHA-256 hash is persisted for correlation. The escalation DB stores:
session_idโ your identifiercategoryโself_harmorcriminal_intentrisk_scoreโ 0.0โ1.0triggersโ which patterns firedmessage_hashโ SHA-256 of the original textstage_reachedโ which pipeline stage produced the resultreasoningโ Stage-3 LLM reasoning (if available)
To enable raw text storage (e.g. for human review):
privacy:
store_message_text: true
Installation Extras
| Extra | Command | What it adds |
|---|---|---|
| (none) | pip install humane-proxy |
Stage 1 heuristics + full API + CLI |
ml |
pip install humane-proxy[ml] |
Stage 2 semantic embeddings (sentence-transformers) |
mcp |
pip install humane-proxy[mcp] |
MCP server for AI agent integration (fastmcp) |
langchain |
pip install humane-proxy[langchain] |
LangChain adapter (MCP + langchain-mcp-adapters) |
all |
pip install humane-proxy[all] |
Everything above |
License
Apache 2.0. See LICENSE.
Copyright 2026 Vishisht Mishra (@Vishisht16). Any attribution is appreciated.
See NOTICE for full attribution information.
Built for a safer world.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file humane_proxy-0.2.1.tar.gz.
File metadata
- Download URL: humane_proxy-0.2.1.tar.gz
- Upload date:
- Size: 63.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0be0575f756baa8af097b53822261e1c1c4ddf9d8e47bb42eb29a31c1b39de56
|
|
| MD5 |
0890e51dac41a7a432fc577ff921dec2
|
|
| BLAKE2b-256 |
6eb8e0cbc7d6a000d00c27103d7c2773e8219cfb98711708576aa0ad12252cc7
|
Provenance
The following attestation bundles were made for humane_proxy-0.2.1.tar.gz:
Publisher:
pypi.yml on Vishisht16/Humane-Proxy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
humane_proxy-0.2.1.tar.gz -
Subject digest:
0be0575f756baa8af097b53822261e1c1c4ddf9d8e47bb42eb29a31c1b39de56 - Sigstore transparency entry: 1203521559
- Sigstore integration time:
-
Permalink:
Vishisht16/Humane-Proxy@8b411c127545077fb0090602f552186a63658298 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Vishisht16
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@8b411c127545077fb0090602f552186a63658298 -
Trigger Event:
release
-
Statement type:
File details
Details for the file humane_proxy-0.2.1-py3-none-any.whl.
File metadata
- Download URL: humane_proxy-0.2.1-py3-none-any.whl
- Upload date:
- Size: 62.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f6c64b6bddd434c5475478326b02c1619f9d692fb1628a83896cb66b8a5075f
|
|
| MD5 |
74446f3acf25fa89b986b3657042b3ae
|
|
| BLAKE2b-256 |
2261a2b0a55dbddfa2638421e488b3484a5b2eec0e313512cdeb82785d5b8ccd
|
Provenance
The following attestation bundles were made for humane_proxy-0.2.1-py3-none-any.whl:
Publisher:
pypi.yml on Vishisht16/Humane-Proxy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
humane_proxy-0.2.1-py3-none-any.whl -
Subject digest:
9f6c64b6bddd434c5475478326b02c1619f9d692fb1628a83896cb66b8a5075f - Sigstore transparency entry: 1203521562
- Sigstore integration time:
-
Permalink:
Vishisht16/Humane-Proxy@8b411c127545077fb0090602f552186a63658298 -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/Vishisht16
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi.yml@8b411c127545077fb0090602f552186a63658298 -
Trigger Event:
release
-
Statement type: