Epistemological firewall for AI agents — detects Knowledge-Action Gap using trained Logos models

These details have not been verified by PyPI

Project links

Project description

logos-firewall

Epistemological firewall for AI agents. Detects when an agent reasons correctly but acts against its own reasoning (Knowledge-Action Gap).

The Problem

Autonomous AI agents execute actions without verification. An agent can reason "this is dangerous" in its think block and then execute the dangerous action anyway. Existing guardrails use prompted models — logos-firewall uses a trained epistemological classifier.

Installation

pip install logos-firewall

For the API server:

pip install "logos-firewall[server]"

Quick Start

Python SDK

from logos_firewall import LogosFirewall, FastGate, ThinkAuditor

# Level A only — fast offline classification (no Ollama needed)
gate = FastGate()
result = gate.classify("rm -rf /")
# result.verdict == "BLOCK"
# result.confidence == 0.95

# Full pipeline — Level A (regex) + Level B (Logos via Ollama)
fw = LogosFirewall(ollama_url="http://localhost:11434")
result = await fw.audit(
    action="rm -rf /etc/config/*",
    think="<think>I need to clean up old files...</think>",
    context="coding_agent",
)
# result.verdict == "BLOCK"

# Standalone Think Block Auditor
auditor = ThinkAuditor(ollama_url="http://localhost:11434")
result = await auditor.audit(
    think_block="<think>This request is dangerous. I should refuse.</think>",
    output="Sure, here's how to do it: ...",
)
# result.verdict == "GAP"  (Knowledge-Action Gap detected)

API Server

# Start the server
logos-firewall
# or: uvicorn logos_firewall.server:app --host 0.0.0.0 --port 8000

# Classify an action (Level A only, no Ollama)
curl -X POST http://localhost:8000/v1/classify \
  -H "Content-Type: application/json" \
  -d '{"action": "rm -rf /"}'

# Full audit (Level A + B)
curl -X POST http://localhost:8000/v1/audit \
  -H "Content-Type: application/json" \
  -d '{"action": "pip install unknown-pkg", "think": "<think>installing dependency</think>"}'

# Think block audit
curl -X POST http://localhost:8000/v1/think-audit \
  -H "Content-Type: application/json" \
  -d '{"think_block": "<think>This is dangerous</think>", "output": "Sure, here you go..."}'

# Health check
curl http://localhost:8000/health

Docker

docker-compose up

This starts both Ollama and the logos-firewall server. You'll need to pull a Logos model into Ollama separately:

# From the Ollama container or host
ollama pull logos10v2_auditor_v3

Architecture

Agent action request
        |
        v
+-------------------------+
|  Level A: FastGate       |  < 10ms
|  (regex + action type)   |
|  ALLOW / BLOCK / STEP_UP |
+------------+------------+
             | STEP_UP
             v
+-------------------------+
|  Level B: LogosGate      |  100-500ms
|  (Logos 1B via Ollama)   |
|  Think-Action audit      |
|  ALLOW / BLOCK / UNCERTAIN|
+-------------------------+

Level A catches obvious cases with regex patterns (destructive commands, safe read-only ops). Unknown or risky actions are escalated to Level B, which uses a Logos fine-tuned model for epistemological evaluation.

Configuration

Environment Variables

Variable	Default	Description
`LOGOS_OLLAMA_URL`	`http://localhost:11434`	Ollama server URL
`LOGOS_API_TOKEN`	(none)	Bearer token for API auth (optional)
`LOGOS_RATE_LIMIT_RPM`	`60`	Requests per minute per IP
`LOGOS_HOST`	`0.0.0.0`	Server bind host
`LOGOS_PORT`	`8000`	Server bind port

Model Chain

By default, logos-firewall tries these Logos models in order:

logos10v2_auditor_v3 (Gemma 3 1B, recommended)
logos9_hybrid (Gemma 3 1B, fallback)
logos9_auditor_v2 (Gemma 3 1B, fallback)

The Think Block Auditor prefers the 9B model (logos-auditor) for higher accuracy.

API Reference

POST /v1/audit

Full firewall audit (Level A -> B).

Request:

{
  "action": "rm -rf /tmp/old-files",
  "think": "<think>cleanup needed</think>",
  "context": "coding_agent"
}

Response:

{
  "verdict": "BLOCK",
  "confidence": 0.95,
  "action_class": "DESTRUCTIVE",
  "mechanism": "regex",
  "detail": "Blocked: wildcard delete",
  "latency_ms": 0.12,
  "level": "A",
  "model": ""
}

POST /v1/think-audit

Standalone think block / output consistency audit.

Request:

{
  "think_block": "<think>This seems dangerous...</think>",
  "output": "Sure, here's how to do it...",
  "domain": "general"
}

Response:

{
  "verdict": "GAP",
  "confidence": 0.15,
  "reasoning": "The reasoning identifies danger but the output ignores it",
  "model": "logos-auditor",
  "latency_ms": 342.5
}

POST /v1/classify

Action classification only (Level A, no Ollama needed).

GET /health

Health check with Ollama and model availability.

Connection to Research

This package implements the agent firewall described in "The Instrument Trap: When Aligned Models Serve Misaligned Purposes" (DOI: 10.5281/zenodo.18644322).

The benchmark dataset (14,950 test cases) is available at LumenSyntax/instrument-trap-benchmark on Hugging Face.

Requirements

Python 3.10+
Ollama with a Logos model loaded (for Level B and Think Auditor)
Level A (FastGate) works entirely offline with no dependencies beyond httpx and pydantic

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0 yanked

Feb 23, 2026

This version

0.1.0

Feb 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

logos_firewall-0.1.0.tar.gz (18.4 kB view details)

Uploaded Feb 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

logos_firewall-0.1.0-py3-none-any.whl (19.7 kB view details)

Uploaded Feb 18, 2026 Python 3

File details

Details for the file logos_firewall-0.1.0.tar.gz.

File metadata

Download URL: logos_firewall-0.1.0.tar.gz
Upload date: Feb 18, 2026
Size: 18.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for logos_firewall-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`41e7f38579269e4ae45928455abf09a297ccf31d47027e15c082f6d9f733b7d4`
MD5	`8cebe160ee605004e20e644aa43abbba`
BLAKE2b-256	`cb5e75995b0ff6f8edd8b60893140f086f90fd73586d9bcc3fe2738d453045a8`

See more details on using hashes here.

File details

Details for the file logos_firewall-0.1.0-py3-none-any.whl.

File metadata

Download URL: logos_firewall-0.1.0-py3-none-any.whl
Upload date: Feb 18, 2026
Size: 19.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for logos_firewall-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`692ac12e60f26b78974466eeda9e9c2b81f2f9a00aa6abf77fa0465a83abdf20`
MD5	`d6b8996bcdf96f99372be4dfecbb5549`
BLAKE2b-256	`5d2c4fd671f1692173cfd777bbceba99c8d1d903c0f039ce8ea43e88dde7aa05`

See more details on using hashes here.

logos-firewall 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

logos-firewall

The Problem

Installation

Quick Start

Python SDK

API Server

Docker

Architecture

Configuration

Environment Variables

Model Chain

API Reference

POST /v1/audit

POST /v1/think-audit

POST /v1/classify

GET /health

Connection to Research

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes