Skip to main content

AI LLM Firewall - Protect LLM applications from prompt injection, jailbreak, and adversarial attacks

Project description

Oubliette Security Platform

AI security honeypot and deception platform for detecting prompt injection attacks.

Python 3.10+ License: Apache 2.0 Detection Rate ML F1


Overview

Oubliette is a defensive AI security platform built to detect, classify, and respond to prompt injection attacks against large language models. Rather than simply blocking malicious input, Oubliette deploys deception techniques -- serving decoy responses and honey tokens to attackers while logging forensic evidence.

Built by a disabled veteran-owned business specializing in cyber deception, AI security, and red teaming.

Architecture

                         Oubliette Security Platform
 +----------------------------------------------------------------------+
 |                                                                      |
 |  User Input                                                          |
 |      |                                                               |
 |      v                                                               |
 |  [1. Input Sanitization] -----> 9 sanitization rules                 |
 |      |                                                               |
 |      v                                                               |
 |  [2. Pre-Filter Rules]  -----> 7 blocking rules (~10ms)              |
 |      |   |                                                           |
 |      |   +---> BLOCKED: decoy response + honey token                 |
 |      v                                                               |
 |  [3. ML Classifier]     -----> LogisticRegression + TF-IDF (~2ms)    |
 |      |   |                                                           |
 |      |   +---> score > 0.85: blocked                                 |
 |      |   +---> score < 0.30: allowed                                 |
 |      v                                                               |
 |  [4. LLM Judge]         -----> Ollama llama3 (~15s)                  |
 |      |                                                               |
 |      v                                                               |
 |  Safe: LLM response    Unsafe: decoy + honey token + log            |
 |                                                                      |
 +----------------------------------------------------------------------+
         |                                    |
    Flask :5000                         FastAPI :8000
   (Honeypot Engine)                (Anomaly Detection API)

Key Features

Detection

  • 4-tier ensemble defense: sanitization, pre-filter rules, ML classifier, LLM judge
  • Multi-turn attack tracking across conversation sessions
  • Jailbreak-specific defenses (roleplay, DAN, hypothetical framing, logic traps)
  • ML classifier: F1=0.98, AUC=0.99, 1.9ms inference on 733 features

Deception

  • Honey token injection into decoy responses
  • Realistic decoy answers that waste attacker time
  • Forensic logging of all attack sessions

Red Teaming

  • 50 YAML attack scenarios mapped to MITRE ATLAS and OWASP LLM Top 10
  • Automated attack execution and success/failure evaluation
  • Scenario coverage: prompt injection, jailbreaking, context switching, nested injection, roleplay, DAN variants, logic traps, multi-turn attacks

Training

  • 11 progressive CTF challenges for prompt injection training
  • Challenges cover: basic injection, defense bypass, code interpreter exploitation, RAG exploitation, agent abuse, multi-modal attacks
  • Full Docker environment with Open WebUI, Ollama, Jupyter

Quick Start

Prerequisites

  • Python 3.10+
  • Ollama with the llama3 model pulled
  • (Optional) Docker and Docker Compose for containerized deployment

Local Setup

# Clone the repository
git clone https://github.com/oubliettesecurity/oubliette.git
cd oubliette

# Install dependencies
pip install flask requests pyyaml scikit-learn

# Pull the LLM model
ollama pull llama3

# Start the Oubliette honeypot
python oubliette_security.py

# (Optional) Start the anomaly detection API in a second terminal
cd anomaly-detection
pip install -r config/requirements_api.txt
python api/anomaly_api.py

The honeypot server runs on http://localhost:5000 and the anomaly detection API on http://localhost:8000.

Docker Setup

docker-compose up --build

This starts both the Oubliette honeypot (port 5000) and the anomaly detection API (port 8000).

Project Structure

oubliette/
|-- oubliette_security.py              # Honeypot engine (Flask, 1325 lines)
|-- redteam_engine.py                 # AI red team engine (869 lines)
|-- redteam_results_db.py             # Red team results database
|-- AI_RED_TEAMING_ATTACK_SCENARIOS.yaml  # 50 attack scenarios
|-- docker-compose.yml                # Container orchestration
|-- Dockerfile.oubliette              # Honeypot container
|
|-- anomaly-detection/                # ML anomaly detection system
|   |-- core/                         #   ML models and training pipeline
|   |   |-- chat_anomaly_detector.py  #     Chat injection classifier
|   |   |-- chat_feature_pipeline.py  #     Feature extraction (TF-IDF + structural)
|   |   |-- chat_training_data.json   #     1365 labeled samples
|   |   +-- train_chat_classifier.py  #     Model training script
|   |-- api/                          #   FastAPI REST server
|   |   +-- anomaly_api.py            #     Detection endpoint
|   |-- mcp/                          #   Model Context Protocol server
|   |-- chronicle/                    #   Google Chronicle SIEM integration
|   |-- batch/                        #   Batch log processing
|   |-- docker/                       #   Anomaly detection container
|   |-- tests/                        #   Test data and ML tests
|   +-- config/                       #   Requirements files
|
|-- AI-CTF/                           # Prompt injection CTF platform
|   |-- openwebui/                    #   Challenge definitions
|   |   |-- functions/                #     Defense filter functions
|   |   |-- knowledge/                #     RAG knowledge base
|   |   |-- pipelines/                #     Custom LLM pipelines
|   |   +-- tools/                    #     Agent tools
|   |-- docker-compose.yaml           #   CTF infrastructure
|   +-- setup.sh                      #   Automated setup
|
|-- tests/                            # Integration tests
|   +-- test_integration.py
|-- test_*.py                         # Unit tests (6 test modules)
+-- quick_*.py                        # Quick validation scripts

Components

Oubliette Security (oubliette_security.py)

The core honeypot engine. A Flask server that intercepts chat messages, runs them through the 4-tier detection pipeline, and either responds normally or deploys deception (decoy responses + honey tokens) when an attack is detected. Tracks session state across conversation turns to catch multi-turn attack sequences.

Endpoints:

  • POST /api/chat -- Chat interface (attack surface)
  • GET /api/health -- Health check
  • GET /api/session/<id> -- Session state inspection

Anomaly Detection (anomaly-detection/)

A modular ML pipeline for log and chat anomaly detection. The chat injection classifier uses LogisticRegression with TF-IDF and structural/keyword/pattern features (733 dimensions) trained on 1365 labeled samples. Exposed via a FastAPI REST API and optionally through an MCP server for AI platform integration.

Integrations: Google Chronicle SIEM, Splunk, Elasticsearch, Slack, Kafka

Red Team Framework (redteam_engine.py)

Automated AI attack testing engine. Loads 50 YAML-defined attack scenarios (ATK-001 through ATK-050), executes them against a target LLM endpoint, and evaluates success or failure using pattern matching. Each scenario is mapped to MITRE ATLAS techniques and OWASP LLM Top 10 categories.

Attack categories: prompt injection, jailbreaking, context switching, nested injection, roleplay, hypothetical framing, DAN variants, logic traps, multi-turn escalation

AI-CTF (AI-CTF/)

A Docker-based Capture The Flag platform with 11 progressive prompt injection challenges built on Open WebUI and Ollama. Challenges range from basic prompt injection to advanced techniques including RAG exploitation, code interpreter abuse, agent tool manipulation, and multi-modal attacks.

Services: Open WebUI (:4242), Ollama (:11434), Pipelines (:9099), Jupyter (:8888)

Detection Pipeline

The 4-tier ensemble processes every incoming chat message:

Tier Component Latency Action
1 Input Sanitization <1ms Neutralizes 9 attack patterns (encoding tricks, special chars, markdown injection, etc.)
2 Pre-Filter Rules ~10ms Blocks obvious attacks via 7 pattern-matching rules. Catches system prompt extraction, instruction override, encoding attacks, jailbreak patterns
3 ML Classifier ~2ms LogisticRegression scores input 0.0-1.0. Above 0.85 = blocked, below 0.30 = allowed, between = escalate to LLM
4 LLM Judge ~15s Ollama llama3 evaluates ambiguous inputs. Verdict extraction handles conversational model output

Additional layers:

  • Multi-turn tracking: Accumulates risk across conversation turns. Escalation thresholds trigger on repeated attack patterns.
  • Jailbreak-specific rules: Dedicated detection for roleplay jailbreaks (ATK-006, 89.6% success rate), DAN variants, hypothetical framing, and logic traps.

Performance Metrics

Metric Value
Detection rate 85-90% (up from 10% baseline)
ML F1 score 0.98
ML AUC-ROC 0.99
ML cross-validation F1 0.986 (mean)
ML inference time 1.9ms average
False positive rate 0% on test set (TN=111, FP=0)
Training samples 1365 (553 benign, 812 malicious)
Feature dimensions 733
Pre-filter latency ~10ms
Full pipeline with LLM ~15s

Testing

# Run all unit tests
pytest test_*.py -v

# Run integration tests (requires running server)
pytest tests/test_integration.py -v

# Run ML classifier tests
pytest anomaly-detection/tests/test_chat_classifier.py -v

# Quick validation (no server required)
python quick_validation_test.py

# Red team attack simulation (requires running server)
python redteam_engine.py

Docker

# Start the full platform (honeypot + anomaly API)
docker-compose up --build

# Start the CTF environment
cd AI-CTF
docker-compose -f docker-compose.yaml up --build
Service Port Description
Oubliette Security 5000 Honeypot engine
Anomaly Detection API 8000 ML classification endpoint
Open WebUI (CTF) 4242 CTF challenge interface
Ollama (CTF) 11434 LLM backend for CTF
Pipelines (CTF) 9099 Custom LLM pipelines
Jupyter (CTF) 8888 Notebook environment

Disclaimer

This software is a security research and training tool. It is designed for authorized security testing, defensive research, CTF competitions, and educational purposes only. Use only on systems you own or have explicit written authorization to test. The authors accept no liability for misuse.

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oubliette_shield-0.3.1.tar.gz (61.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oubliette_shield-0.3.1-py3-none-any.whl (62.2 kB view details)

Uploaded Python 3

File details

Details for the file oubliette_shield-0.3.1.tar.gz.

File metadata

  • Download URL: oubliette_shield-0.3.1.tar.gz
  • Upload date:
  • Size: 61.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for oubliette_shield-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9dd39d388afe446f2a91413e81cec6d358ca9a32eb45738e52e669cd2b006c15
MD5 6f7b7294169c0a323e94e66fef677ca9
BLAKE2b-256 d072e34af8dc68e50dce9eebe2d196ca217fbdffbc940513a6eaf2c8094a160f

See more details on using hashes here.

File details

Details for the file oubliette_shield-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for oubliette_shield-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3d3da1782b687ee6cb45d0d6abf3ae6112c6f77797fe45573a634c916d9dbb8a
MD5 e9d592fc042d013265de4c7552b56459
BLAKE2b-256 0a50879eaee9636a45a3825b6d034b59bca404ef01e22607490c938f76a00984

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page