AI LLM Firewall -- Runtime defense for LLM applications against prompt injection, jailbreak, and adversarial attacks
Project description
Oubliette Security Platform
AI security honeypot and deception platform for detecting prompt injection attacks.
Overview
Oubliette is a defensive AI security platform built to detect, classify, and respond to prompt injection attacks against large language models. Rather than simply blocking malicious input, Oubliette deploys deception techniques -- serving decoy responses and honey tokens to attackers while logging forensic evidence.
Built by a disabled veteran-owned business specializing in cyber deception, AI security, and red teaming.
Architecture
Oubliette Security Platform
+----------------------------------------------------------------------+
| |
| User Input |
| | |
| v |
| [1. Input Sanitization] -----> 9 sanitization rules |
| | |
| v |
| [2. Pre-Filter Rules] -----> 7 blocking rules (~10ms) |
| | | |
| | +---> BLOCKED: decoy response + honey token |
| v |
| [3. ML Classifier] -----> LogisticRegression + TF-IDF (~2ms) |
| | | |
| | +---> score > 0.85: blocked |
| | +---> score < 0.30: allowed |
| v |
| [4. LLM Judge] -----> Ollama llama3 (~15s) |
| | |
| v |
| Safe: LLM response Unsafe: decoy + honey token + log |
| |
+----------------------------------------------------------------------+
| |
Flask :5000 FastAPI :8000
(Honeypot Engine) (Anomaly Detection API)
Key Features
Detection
- 4-tier ensemble defense: sanitization, pre-filter rules, ML classifier, LLM judge
- Multi-turn attack tracking across conversation sessions
- Jailbreak-specific defenses (roleplay, DAN, hypothetical framing, logic traps)
- ML classifier: F1=0.98, AUC=0.99, 1.9ms inference on 733 features
Deception
- Honey token injection into decoy responses
- Realistic decoy answers that waste attacker time
- Forensic logging of all attack sessions
Red Teaming
- 50 YAML attack scenarios mapped to MITRE ATLAS and OWASP LLM Top 10
- Automated attack execution and success/failure evaluation
- Scenario coverage: prompt injection, jailbreaking, context switching, nested injection, roleplay, DAN variants, logic traps, multi-turn attacks
Training
- 11 progressive CTF challenges for prompt injection training
- Challenges cover: basic injection, defense bypass, code interpreter exploitation, RAG exploitation, agent abuse, multi-modal attacks
- Full Docker environment with Open WebUI, Ollama, Jupyter
Quick Start
Prerequisites
- Python 3.10+
- Ollama with the
llama3model pulled - (Optional) Docker and Docker Compose for containerized deployment
Local Setup
# Clone the repository
git clone https://github.com/oubliettesecurity/oubliette.git
cd oubliette
# Install dependencies
pip install flask requests pyyaml scikit-learn
# Pull the LLM model
ollama pull llama3
# Start the Oubliette honeypot
python oubliette_security.py
# (Optional) Start the anomaly detection API in a second terminal
cd anomaly-detection
pip install -r config/requirements_api.txt
python api/anomaly_api.py
The honeypot server runs on http://localhost:5000 and the anomaly detection API on http://localhost:8000.
Docker Setup
docker-compose up --build
This starts both the Oubliette honeypot (port 5000) and the anomaly detection API (port 8000).
Project Structure
oubliette/
|-- oubliette_security.py # Honeypot engine (Flask, 1325 lines)
|-- redteam_engine.py # AI red team engine (869 lines)
|-- redteam_results_db.py # Red team results database
|-- AI_RED_TEAMING_ATTACK_SCENARIOS.yaml # 50 attack scenarios
|-- docker-compose.yml # Container orchestration
|-- Dockerfile.oubliette # Honeypot container
|
|-- anomaly-detection/ # ML anomaly detection system
| |-- core/ # ML models and training pipeline
| | |-- chat_anomaly_detector.py # Chat injection classifier
| | |-- chat_feature_pipeline.py # Feature extraction (TF-IDF + structural)
| | |-- chat_training_data.json # 1365 labeled samples
| | +-- train_chat_classifier.py # Model training script
| |-- api/ # FastAPI REST server
| | +-- anomaly_api.py # Detection endpoint
| |-- mcp/ # Model Context Protocol server
| |-- chronicle/ # Google Chronicle SIEM integration
| |-- batch/ # Batch log processing
| |-- docker/ # Anomaly detection container
| |-- tests/ # Test data and ML tests
| +-- config/ # Requirements files
|
|-- AI-CTF/ # Prompt injection CTF platform
| |-- openwebui/ # Challenge definitions
| | |-- functions/ # Defense filter functions
| | |-- knowledge/ # RAG knowledge base
| | |-- pipelines/ # Custom LLM pipelines
| | +-- tools/ # Agent tools
| |-- docker-compose.yaml # CTF infrastructure
| +-- setup.sh # Automated setup
|
|-- tests/ # Integration tests
| +-- test_integration.py
|-- test_*.py # Unit tests (6 test modules)
+-- quick_*.py # Quick validation scripts
Components
Oubliette Security (oubliette_security.py)
The core honeypot engine. A Flask server that intercepts chat messages, runs them through the 4-tier detection pipeline, and either responds normally or deploys deception (decoy responses + honey tokens) when an attack is detected. Tracks session state across conversation turns to catch multi-turn attack sequences.
Endpoints:
POST /api/chat-- Chat interface (attack surface)GET /api/health-- Health checkGET /api/session/<id>-- Session state inspection
Anomaly Detection (anomaly-detection/)
A modular ML pipeline for log and chat anomaly detection. The chat injection classifier uses LogisticRegression with TF-IDF and structural/keyword/pattern features (733 dimensions) trained on 1365 labeled samples. Exposed via a FastAPI REST API and optionally through an MCP server for AI platform integration.
Integrations: Google Chronicle SIEM, Splunk, Elasticsearch, Slack, Kafka
Red Team Framework (redteam_engine.py)
Automated AI attack testing engine. Loads 50 YAML-defined attack scenarios (ATK-001 through ATK-050), executes them against a target LLM endpoint, and evaluates success or failure using pattern matching. Each scenario is mapped to MITRE ATLAS techniques and OWASP LLM Top 10 categories.
Attack categories: prompt injection, jailbreaking, context switching, nested injection, roleplay, hypothetical framing, DAN variants, logic traps, multi-turn escalation
AI-CTF (AI-CTF/)
A Docker-based Capture The Flag platform with 11 progressive prompt injection challenges built on Open WebUI and Ollama. Challenges range from basic prompt injection to advanced techniques including RAG exploitation, code interpreter abuse, agent tool manipulation, and multi-modal attacks.
Services: Open WebUI (:4242), Ollama (:11434), Pipelines (:9099), Jupyter (:8888)
Detection Pipeline
The 4-tier ensemble processes every incoming chat message:
| Tier | Component | Latency | Action |
|---|---|---|---|
| 1 | Input Sanitization | <1ms | Neutralizes 9 attack patterns (encoding tricks, special chars, markdown injection, etc.) |
| 2 | Pre-Filter Rules | ~10ms | Blocks obvious attacks via 7 pattern-matching rules. Catches system prompt extraction, instruction override, encoding attacks, jailbreak patterns |
| 3 | ML Classifier | ~2ms | LogisticRegression scores input 0.0-1.0. Above 0.85 = blocked, below 0.30 = allowed, between = escalate to LLM |
| 4 | LLM Judge | ~15s | Ollama llama3 evaluates ambiguous inputs. Verdict extraction handles conversational model output |
Additional layers:
- Multi-turn tracking: Accumulates risk across conversation turns. Escalation thresholds trigger on repeated attack patterns.
- Jailbreak-specific rules: Dedicated detection for roleplay jailbreaks (ATK-006, 89.6% success rate), DAN variants, hypothetical framing, and logic traps.
Performance Metrics
| Metric | Value |
|---|---|
| Detection rate | 85-90% (up from 10% baseline) |
| ML F1 score | 0.98 |
| ML AUC-ROC | 0.99 |
| ML cross-validation F1 | 0.986 (mean) |
| ML inference time | 1.9ms average |
| False positive rate | 0% on test set (TN=111, FP=0) |
| Training samples | 1365 (553 benign, 812 malicious) |
| Feature dimensions | 733 |
| Pre-filter latency | ~10ms |
| Full pipeline with LLM | ~15s |
Testing
# Run all unit tests
pytest test_*.py -v
# Run integration tests (requires running server)
pytest tests/test_integration.py -v
# Run ML classifier tests
pytest anomaly-detection/tests/test_chat_classifier.py -v
# Quick validation (no server required)
python quick_validation_test.py
# Red team attack simulation (requires running server)
python redteam_engine.py
Docker
# Start the full platform (honeypot + anomaly API)
docker-compose up --build
# Start the CTF environment
cd AI-CTF
docker-compose -f docker-compose.yaml up --build
| Service | Port | Description |
|---|---|---|
| Oubliette Security | 5000 | Honeypot engine |
| Anomaly Detection API | 8000 | ML classification endpoint |
| Open WebUI (CTF) | 4242 | CTF challenge interface |
| Ollama (CTF) | 11434 | LLM backend for CTF |
| Pipelines (CTF) | 9099 | Custom LLM pipelines |
| Jupyter (CTF) | 8888 | Notebook environment |
Disclaimer
This software is a security research and training tool. It is designed for authorized security testing, defensive research, CTF competitions, and educational purposes only. Use only on systems you own or have explicit written authorization to test. The authors accept no liability for misuse.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oubliette_shield-0.3.2.tar.gz.
File metadata
- Download URL: oubliette_shield-0.3.2.tar.gz
- Upload date:
- Size: 81.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22d275e678af54d51959d2607749575bfdf9d04da60672cf9034404346ea9421
|
|
| MD5 |
4c2c0df0c850cde384130258c4932412
|
|
| BLAKE2b-256 |
62bb9d26ffe40801305d0c6d337a675f7ddab0a000c4894d4ebff210af4cd9a9
|
File details
Details for the file oubliette_shield-0.3.2-py3-none-any.whl.
File metadata
- Download URL: oubliette_shield-0.3.2-py3-none-any.whl
- Upload date:
- Size: 78.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1175afd5b1895feed4919a8179f567619d2bfe21e21da96f64f740580b262966
|
|
| MD5 |
8f0ade47b90d8498a1f5e383dfcabe5d
|
|
| BLAKE2b-256 |
39c717fb75a8c03dd04e0867233e0e1f0c058c4f32a4a0c27b10364185951d52
|