AI LLM Firewall -- Runtime defense for LLM applications against prompt injection, jailbreak, and adversarial attacks

These details have not been verified by PyPI

Project links

Project description

Oubliette Security Platform

AI security honeypot and deception platform for detecting prompt injection attacks.

Python 3.10+ License: Apache 2.0 Detection Rate ML F1

Overview

Oubliette is a defensive AI security platform built to detect, classify, and respond to prompt injection attacks against large language models. Rather than simply blocking malicious input, Oubliette deploys deception techniques -- serving decoy responses and honey tokens to attackers while logging forensic evidence.

Built by a disabled veteran-owned business specializing in cyber deception, AI security, and red teaming.

Architecture

                         Oubliette Security Platform
 +----------------------------------------------------------------------+
 |                                                                      |
 |  User Input                                                          |
 |      |                                                               |
 |      v                                                               |
 |  [1. Input Sanitization] -----> 9 sanitization rules                 |
 |      |                                                               |
 |      v                                                               |
 |  [2. Pre-Filter Rules]  -----> 7 blocking rules (~10ms)              |
 |      |   |                                                           |
 |      |   +---> BLOCKED: decoy response + honey token                 |
 |      v                                                               |
 |  [3. ML Classifier]     -----> LogisticRegression + TF-IDF (~2ms)    |
 |      |   |                                                           |
 |      |   +---> score > 0.85: blocked                                 |
 |      |   +---> score < 0.30: allowed                                 |
 |      v                                                               |
 |  [4. LLM Judge]         -----> Ollama llama3 (~15s)                  |
 |      |                                                               |
 |      v                                                               |
 |  Safe: LLM response    Unsafe: decoy + honey token + log            |
 |                                                                      |
 +----------------------------------------------------------------------+
         |                                    |
    Flask :5000                         FastAPI :8000
   (Honeypot Engine)                (Anomaly Detection API)

Key Features

Detection

4-tier ensemble defense: sanitization, pre-filter rules, ML classifier, LLM judge
Multi-turn attack tracking across conversation sessions
Jailbreak-specific defenses (roleplay, DAN, hypothetical framing, logic traps)
ML classifier: F1=0.98, AUC=0.99, 1.9ms inference on 733 features

Deception

Honey token injection into decoy responses
Realistic decoy answers that waste attacker time
Forensic logging of all attack sessions

Red Teaming

50 YAML attack scenarios mapped to MITRE ATLAS and OWASP LLM Top 10
Automated attack execution and success/failure evaluation
Scenario coverage: prompt injection, jailbreaking, context switching, nested injection, roleplay, DAN variants, logic traps, multi-turn attacks

Training

11 progressive CTF challenges for prompt injection training
Challenges cover: basic injection, defense bypass, code interpreter exploitation, RAG exploitation, agent abuse, multi-modal attacks
Full Docker environment with Open WebUI, Ollama, Jupyter

Quick Start

Prerequisites

Python 3.10+
Ollama with the llama3 model pulled
(Optional) Docker and Docker Compose for containerized deployment

Local Setup

# Clone the repository
git clone https://github.com/oubliettesecurity/oubliette.git
cd oubliette

# Install dependencies
pip install flask requests pyyaml scikit-learn

# Pull the LLM model
ollama pull llama3

# Start the Oubliette honeypot
python oubliette_security.py

# (Optional) Start the anomaly detection API in a second terminal
cd anomaly-detection
pip install -r config/requirements_api.txt
python api/anomaly_api.py

The honeypot server runs on http://localhost:5000 and the anomaly detection API on http://localhost:8000.

Docker Setup

docker-compose up --build

This starts both the Oubliette honeypot (port 5000) and the anomaly detection API (port 8000).

Project Structure

oubliette/
|-- oubliette_security.py              # Honeypot engine (Flask, 1325 lines)
|-- redteam_engine.py                 # AI red team engine (869 lines)
|-- redteam_results_db.py             # Red team results database
|-- AI_RED_TEAMING_ATTACK_SCENARIOS.yaml  # 50 attack scenarios
|-- docker-compose.yml                # Container orchestration
|-- Dockerfile.oubliette              # Honeypot container
|
|-- anomaly-detection/                # ML anomaly detection system
|   |-- core/                         #   ML models and training pipeline
|   |   |-- chat_anomaly_detector.py  #     Chat injection classifier
|   |   |-- chat_feature_pipeline.py  #     Feature extraction (TF-IDF + structural)
|   |   |-- chat_training_data.json   #     1365 labeled samples
|   |   +-- train_chat_classifier.py  #     Model training script
|   |-- api/                          #   FastAPI REST server
|   |   +-- anomaly_api.py            #     Detection endpoint
|   |-- mcp/                          #   Model Context Protocol server
|   |-- chronicle/                    #   Google Chronicle SIEM integration
|   |-- batch/                        #   Batch log processing
|   |-- docker/                       #   Anomaly detection container
|   |-- tests/                        #   Test data and ML tests
|   +-- config/                       #   Requirements files
|
|-- AI-CTF/                           # Prompt injection CTF platform
|   |-- openwebui/                    #   Challenge definitions
|   |   |-- functions/                #     Defense filter functions
|   |   |-- knowledge/                #     RAG knowledge base
|   |   |-- pipelines/                #     Custom LLM pipelines
|   |   +-- tools/                    #     Agent tools
|   |-- docker-compose.yaml           #   CTF infrastructure
|   +-- setup.sh                      #   Automated setup
|
|-- tests/                            # Integration tests
|   +-- test_integration.py
|-- test_*.py                         # Unit tests (6 test modules)
+-- quick_*.py                        # Quick validation scripts

Components

Oubliette Security (`oubliette_security.py`)

The core honeypot engine. A Flask server that intercepts chat messages, runs them through the 4-tier detection pipeline, and either responds normally or deploys deception (decoy responses + honey tokens) when an attack is detected. Tracks session state across conversation turns to catch multi-turn attack sequences.

Endpoints:

POST /api/chat -- Chat interface (attack surface)
GET /api/health -- Health check
GET /api/session/<id> -- Session state inspection

Anomaly Detection (`anomaly-detection/`)

A modular ML pipeline for log and chat anomaly detection. The chat injection classifier uses LogisticRegression with TF-IDF and structural/keyword/pattern features (733 dimensions) trained on 1365 labeled samples. Exposed via a FastAPI REST API and optionally through an MCP server for AI platform integration.

Integrations: Google Chronicle SIEM, Splunk, Elasticsearch, Slack, Kafka

Red Team Framework (`redteam_engine.py`)

Automated AI attack testing engine. Loads 50 YAML-defined attack scenarios (ATK-001 through ATK-050), executes them against a target LLM endpoint, and evaluates success or failure using pattern matching. Each scenario is mapped to MITRE ATLAS techniques and OWASP LLM Top 10 categories.

Attack categories: prompt injection, jailbreaking, context switching, nested injection, roleplay, hypothetical framing, DAN variants, logic traps, multi-turn escalation

AI-CTF (`AI-CTF/`)

A Docker-based Capture The Flag platform with 11 progressive prompt injection challenges built on Open WebUI and Ollama. Challenges range from basic prompt injection to advanced techniques including RAG exploitation, code interpreter abuse, agent tool manipulation, and multi-modal attacks.

Services: Open WebUI (:4242), Ollama (:11434), Pipelines (:9099), Jupyter (:8888)

Detection Pipeline

The 4-tier ensemble processes every incoming chat message:

Tier	Component	Latency	Action
1	Input Sanitization	<1ms	Neutralizes 9 attack patterns (encoding tricks, special chars, markdown injection, etc.)
2	Pre-Filter Rules	~10ms	Blocks obvious attacks via 7 pattern-matching rules. Catches system prompt extraction, instruction override, encoding attacks, jailbreak patterns
3	ML Classifier	~2ms	LogisticRegression scores input 0.0-1.0. Above 0.85 = blocked, below 0.30 = allowed, between = escalate to LLM
4	LLM Judge	~15s	Ollama llama3 evaluates ambiguous inputs. Verdict extraction handles conversational model output

Additional layers:

Multi-turn tracking: Accumulates risk across conversation turns. Escalation thresholds trigger on repeated attack patterns.
Jailbreak-specific rules: Dedicated detection for roleplay jailbreaks (ATK-006, 89.6% success rate), DAN variants, hypothetical framing, and logic traps.

Performance Metrics

Metric	Value
Detection rate	85-90% (up from 10% baseline)
ML F1 score	0.98
ML AUC-ROC	0.99
ML cross-validation F1	0.986 (mean)
ML inference time	1.9ms average
False positive rate	0% on test set (TN=111, FP=0)
Training samples	1365 (553 benign, 812 malicious)
Feature dimensions	733
Pre-filter latency	~10ms
Full pipeline with LLM	~15s

Testing

# Run all unit tests
pytest test_*.py -v

# Run integration tests (requires running server)
pytest tests/test_integration.py -v

# Run ML classifier tests
pytest anomaly-detection/tests/test_chat_classifier.py -v

# Quick validation (no server required)
python quick_validation_test.py

# Red team attack simulation (requires running server)
python redteam_engine.py

Docker

# Start the full platform (honeypot + anomaly API)
docker-compose up --build

# Start the CTF environment
cd AI-CTF
docker-compose -f docker-compose.yaml up --build

Service	Port	Description
Oubliette Security	5000	Honeypot engine
Anomaly Detection API	8000	ML classification endpoint
Open WebUI (CTF)	4242	CTF challenge interface
Ollama (CTF)	11434	LLM backend for CTF
Pipelines (CTF)	9099	Custom LLM pipelines
Jupyter (CTF)	8888	Notebook environment

Disclaimer

This software is a security research and training tool. It is designed for authorized security testing, defensive research, CTF competitions, and educational purposes only. Use only on systems you own or have explicit written authorization to test. The authors accept no liability for misuse.

License

Apache License 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Feb 26, 2026

0.4.0

Feb 17, 2026

This version

0.3.2

Feb 15, 2026

0.3.1

Feb 13, 2026

0.3.0

Feb 12, 2026

0.2.2

Feb 10, 2026

0.2.1

Feb 10, 2026

0.2.0

Feb 9, 2026

0.1.0

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oubliette_shield-0.3.2.tar.gz (81.7 kB view details)

Uploaded Feb 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oubliette_shield-0.3.2-py3-none-any.whl (78.3 kB view details)

Uploaded Feb 15, 2026 Python 3

File details

Details for the file oubliette_shield-0.3.2.tar.gz.

File metadata

Download URL: oubliette_shield-0.3.2.tar.gz
Upload date: Feb 15, 2026
Size: 81.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for oubliette_shield-0.3.2.tar.gz
Algorithm	Hash digest
SHA256	`22d275e678af54d51959d2607749575bfdf9d04da60672cf9034404346ea9421`
MD5	`4c2c0df0c850cde384130258c4932412`
BLAKE2b-256	`62bb9d26ffe40801305d0c6d337a675f7ddab0a000c4894d4ebff210af4cd9a9`

See more details on using hashes here.

File details

Details for the file oubliette_shield-0.3.2-py3-none-any.whl.

File metadata

Download URL: oubliette_shield-0.3.2-py3-none-any.whl
Upload date: Feb 15, 2026
Size: 78.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for oubliette_shield-0.3.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1175afd5b1895feed4919a8179f567619d2bfe21e21da96f64f740580b262966`
MD5	`8f0ade47b90d8498a1f5e383dfcabe5d`
BLAKE2b-256	`39c717fb75a8c03dd04e0867233e0e1f0c058c4f32a4a0c27b10364185951d52`

See more details on using hashes here.

oubliette-shield 0.3.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Oubliette Security Platform

Overview

Architecture

Key Features

Quick Start

Prerequisites

Local Setup

Docker Setup

Project Structure

Components

Oubliette Security (oubliette_security.py)

Anomaly Detection (anomaly-detection/)

Red Team Framework (redteam_engine.py)

AI-CTF (AI-CTF/)

Detection Pipeline

Performance Metrics

Testing

Docker

Disclaimer

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Oubliette Security (`oubliette_security.py`)

Anomaly Detection (`anomaly-detection/`)

Red Team Framework (`redteam_engine.py`)

AI-CTF (`AI-CTF/`)