Epistemological firewall for AI agents — detects Knowledge-Action Gap using trained Logos models
This project has been archived.
The maintainers of this project have marked this project as archived. No new releases are expected.
Project description
logos-firewall
Epistemological firewall for AI agents. Detects when an agent reasons correctly but acts against its own reasoning (Knowledge-Action Gap).
The Problem
Autonomous AI agents execute actions without verification. An agent can reason "this is dangerous" in its think block and then execute the dangerous action anyway. Existing guardrails use prompted models — logos-firewall uses a trained epistemological classifier.
Installation
pip install logos-firewall
For the API server:
pip install "logos-firewall[server]"
Quick Start
Python SDK
from logos_firewall import LogosFirewall, FastGate, ThinkAuditor
# Level A only — fast offline classification (no Ollama needed)
gate = FastGate()
result = gate.classify("rm -rf /")
# result.verdict == "BLOCK"
# result.confidence == 0.95
# Full pipeline — Level A (regex) + Level B (Logos via Ollama)
fw = LogosFirewall(ollama_url="http://localhost:11434")
result = await fw.audit(
action="rm -rf /etc/config/*",
think="<think>I need to clean up old files...</think>",
context="coding_agent",
)
# result.verdict == "BLOCK"
# Standalone Think Block Auditor
auditor = ThinkAuditor(ollama_url="http://localhost:11434")
result = await auditor.audit(
think_block="<think>This request is dangerous. I should refuse.</think>",
output="Sure, here's how to do it: ...",
)
# result.verdict == "GAP" (Knowledge-Action Gap detected)
API Server
# Start the server
logos-firewall
# or: uvicorn logos_firewall.server:app --host 0.0.0.0 --port 8000
# Classify an action (Level A only, no Ollama)
curl -X POST http://localhost:8000/v1/classify \
-H "Content-Type: application/json" \
-d '{"action": "rm -rf /"}'
# Full audit (Level A + B)
curl -X POST http://localhost:8000/v1/audit \
-H "Content-Type: application/json" \
-d '{"action": "pip install unknown-pkg", "think": "<think>installing dependency</think>"}'
# Think block audit
curl -X POST http://localhost:8000/v1/think-audit \
-H "Content-Type: application/json" \
-d '{"think_block": "<think>This is dangerous</think>", "output": "Sure, here you go..."}'
# Health check
curl http://localhost:8000/health
Docker
docker-compose up
This starts both Ollama and the logos-firewall server. You'll need to pull a Logos model into Ollama separately:
# From the Ollama container or host
ollama pull logos10v2_auditor_v3
Architecture
Agent action request
|
v
+-------------------------+
| Level A: FastGate | < 10ms
| (regex + action type) |
| ALLOW / BLOCK / STEP_UP |
+------------+------------+
| STEP_UP
v
+-------------------------+
| Level B: LogosGate | 100-500ms
| (Logos 1B via Ollama) |
| Think-Action audit |
| ALLOW / BLOCK / UNCERTAIN|
+-------------------------+
Level A catches obvious cases with regex patterns (destructive commands, safe read-only ops). Unknown or risky actions are escalated to Level B, which uses a Logos fine-tuned model for epistemological evaluation.
Configuration
Environment Variables
| Variable | Default | Description |
|---|---|---|
LOGOS_OLLAMA_URL |
http://localhost:11434 |
Ollama server URL |
LOGOS_API_TOKEN |
(none) | Bearer token for API auth (optional) |
LOGOS_RATE_LIMIT_RPM |
60 |
Requests per minute per IP |
LOGOS_HOST |
0.0.0.0 |
Server bind host |
LOGOS_PORT |
8000 |
Server bind port |
Model Chain
By default, logos-firewall tries these Logos models in order:
logos10v2_auditor_v3(Gemma 3 1B, recommended)logos9_hybrid(Gemma 3 1B, fallback)logos9_auditor_v2(Gemma 3 1B, fallback)
The Think Block Auditor prefers the 9B model (logos-auditor) for higher accuracy.
API Reference
POST /v1/audit
Full firewall audit (Level A -> B).
Request:
{
"action": "rm -rf /tmp/old-files",
"think": "<think>cleanup needed</think>",
"context": "coding_agent"
}
Response:
{
"verdict": "BLOCK",
"confidence": 0.95,
"action_class": "DESTRUCTIVE",
"mechanism": "regex",
"detail": "Blocked: wildcard delete",
"latency_ms": 0.12,
"level": "A",
"model": ""
}
POST /v1/think-audit
Standalone think block / output consistency audit.
Request:
{
"think_block": "<think>This seems dangerous...</think>",
"output": "Sure, here's how to do it...",
"domain": "general"
}
Response:
{
"verdict": "GAP",
"confidence": 0.15,
"reasoning": "The reasoning identifies danger but the output ignores it",
"model": "logos-auditor",
"latency_ms": 342.5
}
POST /v1/classify
Action classification only (Level A, no Ollama needed).
GET /health
Health check with Ollama and model availability.
Connection to Research
This package implements the agent firewall described in "The Instrument Trap: When Aligned Models Serve Misaligned Purposes" (DOI: 10.5281/zenodo.18644322).
The benchmark dataset (14,950 test cases) is available at LumenSyntax/instrument-trap-benchmark on Hugging Face.
Requirements
- Python 3.10+
- Ollama with a Logos model loaded (for Level B and Think Auditor)
- Level A (FastGate) works entirely offline with no dependencies beyond httpx and pydantic
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file logos_firewall-0.1.0.tar.gz.
File metadata
- Download URL: logos_firewall-0.1.0.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41e7f38579269e4ae45928455abf09a297ccf31d47027e15c082f6d9f733b7d4
|
|
| MD5 |
8cebe160ee605004e20e644aa43abbba
|
|
| BLAKE2b-256 |
cb5e75995b0ff6f8edd8b60893140f086f90fd73586d9bcc3fe2738d453045a8
|
File details
Details for the file logos_firewall-0.1.0-py3-none-any.whl.
File metadata
- Download URL: logos_firewall-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
692ac12e60f26b78974466eeda9e9c2b81f2f9a00aa6abf77fa0465a83abdf20
|
|
| MD5 |
d6b8996bcdf96f99372be4dfecbb5549
|
|
| BLAKE2b-256 |
5d2c4fd671f1692173cfd777bbceba99c8d1d903c0f039ce8ea43e88dde7aa05
|