AI security framework: prompt injection firewall, tamper-proof integrity verification, ethical guardrails, and DDoS protection.

These details have not been verified by PyPI

Project links

Project description

Sovereign Shield

A standalone AI security framework extracted from the KAIROS Autonomous Intelligence System.

Sovereign Shield provides a comprehensive, layered defense system for AI applications, APIs, and autonomous agents. Every component is tamper-proof, hash-verified, and designed to be impossible to bypass at runtime.

Everything happens before the LLM executes anything. Zero dependencies, zero latency, deterministic. Same input = same decision 100% of the time.

⚠️ Upgrading to 1.1.0

If upgrading from an earlier version, delete your data/.core_safety_lock and data/.conscience_lock files after installing. The hash integrity check seals the source code — since the source changed, your old lockfile will mismatch and trigger an integrity violation. It reseals automatically on next startup.

What changed in 1.0.4 → 1.1.0

** Self-Expanding Minefield (V2)**: AdaptiveShield now classifies attacks into categories (exfiltration, injection, impersonation, etc.) and learns keyword clusters. One report blocks an entire class of similar attacks it has never seen before.
** Self-Pruning False Positives**: New report_false_positive() method removes learned keywords that wrongly block clean inputs — preserving immutable predefined rules. The system gets smarter and more precise simultaneously.
** Multilingual Detection**: InputFilter now blocks injection attempts in 12 languages (French, German, Spanish, Portuguese, Italian, Dutch, Polish, Russian, Chinese, Japanese, Korean, Arabic).
** Multi-Decode Pipeline**: Automatic Base64, ROT13, leet speak, and reversed text decoding catches encoded bypass attempts.
** Benchmark**: 300 real-world attack payloads across 10 categories — converges from 2.7% → 78.7% → 100% detection in 2 learning generations. 0 false positives on 50 clean inputs.

What changed in 1.0.3 → 1.0.4

** AdaptiveShield (NEW)**: Self-improving security filter that learns from missed attacks. Reports trigger automatic rule generation → sandbox replay against historical traffic → threshold-gated deployment. Patent Pending.
InputFilter: Fixed Unicode homoglyph bypass — Greek/Cyrillic lookalike characters (e.g. Ι, Ρ, А, О) now fold to Latin equivalents before keyword matching. Added 40+ character mappings.
InputFilter: Fixed Base64/encoded payload bypass — improved entropy detection with Base64 signature analysis (catches = padding + digit/symbol density).
Firewall: Fixed instant re-blocking — stale timestamps in the sliding window caused users to be re-blocked immediately after their block expired. History is now cleared on expiry.

What changed in 1.0.2 → 1.0.3

InputFilter: Added 18 missing prompt injection keywords (IGNORE ALL, ACT AS, PRETEND TO BE, DISREGARD ALL, BYPASS ALL, etc.) — these previously bypassed detection because filler words broke substring matching.
CoreSafety: Rate limiter is now configurable via rate_limit_interval parameter (default 0.5s). Set to 0 to disable when your application handles its own rate limiting.

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         SOVEREIGN SHIELD                             │
├──────────┬──────────────┬───────────┬──────────────┬────────────────┤
│ Firewall │ InputFilter  │Conscience │  CoreSafety  │ AdaptiveShield │
│(Layer 1) │  (Layer 2)   │ (Layer 3) │  (Layer 4)   │   (Layer 5)    │
│          │              │           │              │                │
│• Identity│ • Unicode    │• Deception│ • Hash Seal  │• Self-Improving│
│  White-  │   Normalize  │  Detection│ • Integrity  │  Filter        │
│  list    │ • Injection  │• Harm     │   Verify     │• Scan Logging  │
│• Rate    │   Blocking   │  Patterns │ • Action     │• Report        │
│  Limiting│ • Gibberish  │• IP Leak  │   Auditing   │  Interface     │
│• DDoS    │   Detection  │  Detection│ • Killswitch │• Sandbox       │
│  Protect │ • LLM Token  │• Evasion  │ • Write/Read │  Replay        │
│• Persisted│  Blocking   │  Detection│   Whitelists │• Threshold     │
│  Ledger  │ • Keyword    │• Self-    │ • Malware    │  Gated Deploy  │
│          │   Blocking   │  Preserve │   Syntax     │• Manual        │
│          │              │           │ • Budget     │  Approval      │
│          │              │           │ • Rate Limit │• SQLite        │
│          │              │           │              │  Persistence   │
└──────────┴──────────────┴───────────┴──────────────┴────────────────┘

Components

1. `CoreSafety` — The Immutable Constitution

The foundation. Uses a FrozenNamespace metaclass that makes all security laws physically immutable in Python's memory — they cannot be overwritten, even by the application itself.

Key Features:

SHA-256 Hash Seal: On first boot, hashes its own source file and writes it to a lockfile. Every subsequent boot verifies the file hasn't been tampered with. Mismatch = instant kill.
Action Auditor: Every action passes through audit_action() which checks: admin privileges, file whitelists, domain restrictions, self-modification ban, code exfiltration patterns, malware syntax, and rate limits.
Hallucination Shield: Detects when an AI claims to be "analyzing" or "processing" in a text response without actually using a tool.
Budget Limiter: Thread-safe daily action counter to prevent runaway API costs.
Killswitch: A single file that instantly terminates the process.

2. `Conscience` — The Moral Compass

Evaluates every action against ethical directives using pre-compiled regex patterns.

Key Features:

Deception Detection: Catches 22+ manipulation verbs (lie, fake, trick, roleplay, gaslight, etc.)
Harm Reduction: Blocks actions containing 24+ harm keywords
IP Protection: Detects attempts to extract source code, system prompts, or architecture details
Fake Tool Injection: Catches syntactically valid but unauthorized tool calls
Self-Preservation: Refuses self-termination or deletion of critical files
Hash-Sealed: Same lockfile integrity verification as CoreSafety

3. `InputFilter` — The Sensory Cortex

Sanitizes all input before it reaches any processing logic.

Key Features:

Unicode Normalization + ASCII Folding: NFKC normalization plus 40+ Greek/Cyrillic homoglyph mappings (defeats lookalike character attacks)
ANSI Stripping: Removes terminal escape codes that could manipulate display
Gibberish Detection: Entropy analysis + Base64 signature detection catches encoded payloads
Escape Injection: Blocks raw \u0057 and \x57 unicode/hex literals
LLM Token Blocking: Catches ChatML (<|im_start|>), LLaMA ([INST]), and system tokens
Keyword Injection: 30+ jailbreak keywords (ignore previous, sudo, DAN mode, etc.)

4. `Firewall` — The Identity Gateway

Controls who can access the system and how fast.

Key Features:

User Whitelist: Only specified user IDs can interact
Sliding Window Rate Limiter: Configurable messages-per-window
Auto-Blocking: Violators are blocked for a configurable duration
Disk Persistence: Block ledger survives process restarts
Thread-Safe: All operations use locks

5. `AdaptiveShield` — The Self-Improving Filter (Patent Pending)

A closed-loop security filter that autonomously learns from missed attacks, deploys validated rules, and self-prunes false positives.

Key Features:

Scan Logging: Every input is logged with a unique scan ID, full text, allow/block decision, and timestamp
Report Interface: Users report false negatives by scan ID — the original input is retrieved for pattern extraction
Self-Expanding Minefield (V2): Extracts keywords from reported attacks, classifies them into attack categories, and stores them in a persistent keyword database. A single report blocks an entire class of similar attacks.
Category Threshold Matching: Requires 2+ keywords from the same attack category to trigger — dramatically reduces false positives while maintaining high detection
Self-Pruning (V2): report_false_positive() identifies and removes only the learned keywords that caused a wrongful block. Predefined rules are immutable.
Sandbox Replay: Candidate rules are tested against all historical allowed inputs to calculate false positive rates
Threshold-Gated Deployment: Rules below a configurable FP threshold (default 1%) are auto-deployed; rules above are flagged for manual review
Manual Approval Workflow: List, approve, reject, or bulk-approve pending rules
Two Deployment Modes: Automatic (rules deploy instantly) or manual (all rules require explicit approval)
Fully Offline: SQLite database, zero cloud dependencies, deterministic behavior
Thread-Safe: All database operations protected by mutual exclusion locks

Quick Start

from sovereign_shield import CoreSafety, Conscience, InputFilter, Firewall, AdaptiveShield

# 1. Initialize the hash seals (do this ONCE at startup)
CoreSafety.initialize_seal(data_dir="./security_data")
Conscience.initialize(data_dir="./security_data")

# 2. Create your firewall
fw = Firewall(
    allowed_users=[12345, 67890],  # Only these user IDs can interact
    rate_limit=10,                  # 10 messages per 60s window
    window=60,
    block_duration=300,             # 5min block for violators
    ledger_path="./security_data/ddos_ledger.json"
)

# 3. Create the input filter
input_filter = InputFilter(
    safe_keywords=["internal_command"],  # These bypass the filter
)

# 4. Create the adaptive shield (self-improving filter)
adaptive = AdaptiveShield(
    db_path="./security_data/adaptive.db",
    fp_threshold=0.01,  # 1% false positive threshold
    auto_deploy=True,   # Rules deploy automatically when validated
)

# 5. Process a request
def handle_request(user_id, user_input):
    # Layer 1: Identity + Rate Limit
    allowed, reason = fw.check(user_id)
    if not allowed:
        return f"BLOCKED: {reason}"
    
    # Layer 2: Input Sanitization (static + adaptive rules)
    result = adaptive.scan(user_input)
    if not result["allowed"]:
        return f"REJECTED: {result['reason']}"
    
    # Layer 3: Ethical Check
    approved, ethics_reason = Conscience.evaluate_action("RESPOND", user_input)
    if not approved:
        return f"ETHICS BLOCK: {ethics_reason}"
    
    # Layer 4: Action Audit
    authorized, audit_reason = CoreSafety.audit_action("ANSWER", user_input)
    if not authorized:
        return f"SAFETY BLOCK: {audit_reason}"
    
    # All clear — process the request
    return process_safely(user_input)

# 6. Report a missed attack (triggers self-improvement)
# report = adaptive.report(scan_id="abc123", reason="data exfiltration attempt")

# 7. Report a false positive (triggers self-pruning)
# fp = adaptive.report_false_positive(scan_id="def456", reason="legitimate question")

Security Properties

Property	Mechanism
Tamper-Proof	SHA-256 hash seal with lockfile. Process kills itself on mismatch.
Immutable Laws	`FrozenNamespace` metaclass physically prevents attribute modification.
Defense in Depth	4 independent layers — compromising one doesn't bypass others.
Fail-Closed	On verification failure, the system shuts down rather than running unprotected.
Thread-Safe	All shared state protected by locks.
Persistent	Block ledgers and usage counters survive restarts.
Self-Improving	Adaptive filter learns from missed attacks via sandbox-validated rules.
Admin Detection	Refuses to run as root/admin (least privilege enforcement).
Anti-Exfiltration	Blocks attempts to read source code, configs, or environment variables.

File Structure

SovereignShield/
├── README.md               ← You are here
├── LICENSE                  ← BSL 1.1
├── pyproject.toml           ← Package config
├── test_shield.py           ← 39 test cases
└── sovereign_shield/
    ├── __init__.py          ← Public API (imports all components)
    ├── core.py              ← CoreSafety + FrozenNamespace
    ├── conscience.py        ← Ethical evaluation engine
    ├── input_filter.py      ← Input sanitization
    ├── firewall.py          ← Identity + rate limiting
    └── adaptive.py          ← AdaptiveShield (self-improving filter)

Tests

python -m unittest test_shield -v

155 test cases covering FrozenNamespace immutability, InputFilter (with homoglyph, entropy, and multilingual attacks), Firewall, Conscience, CoreSafety, AdaptiveShield V2 (self-expanding minefield, self-pruning), and FullShield integration.

License

Business Source License 1.1 — Free for non-production use (personal projects, research, testing, evaluation). Commercial license required for production use. Converts to Apache 2.0 ten years from each release.

Origin

Extracted from the KAIROS Autonomous Intelligence System — a sovereign AI entity with 24/7 autonomous operation. These security components protect KAIROS from prompt injection, jailbreaking, self-modification, data exfiltration, and all known AI manipulation techniques.

Patent Pending — Truth Adapter Validation System | Self-Improving Security Filter System

Built by Mattijs Moens

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

3.0.0

Apr 13, 2026

2.4.6

Apr 6, 2026

2.4.5

Apr 6, 2026

2.4.4

Apr 6, 2026

2.4.3

Apr 2, 2026

2.4.2

Mar 28, 2026

2.4.1

Mar 28, 2026

2.4.0

Mar 28, 2026

2.3.2

Mar 27, 2026

2.3.1

Mar 27, 2026

2.3.0

Mar 27, 2026

2.2.3

Mar 26, 2026

2.2.2

Mar 26, 2026

2.2.1

Mar 25, 2026

2.1.1

Mar 19, 2026

2.1.0

Mar 15, 2026

2.0.1

Mar 15, 2026

2.0.0

Mar 15, 2026

1.2.2

Mar 13, 2026

1.2.1

Mar 13, 2026

1.2.0

Mar 13, 2026

This version

1.1.0

Mar 12, 2026

1.0.4

Mar 11, 2026

1.0.3

Mar 11, 2026

1.0.2

Mar 11, 2026

1.0.1

Mar 9, 2026

1.0.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sovereign_shield-1.1.0.tar.gz (40.4 kB view details)

Uploaded Mar 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sovereign_shield-1.1.0-py3-none-any.whl (37.9 kB view details)

Uploaded Mar 12, 2026 Python 3

File details

Details for the file sovereign_shield-1.1.0.tar.gz.

File metadata

Download URL: sovereign_shield-1.1.0.tar.gz
Upload date: Mar 12, 2026
Size: 40.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for sovereign_shield-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ec241ae3c8addffe1d6cd3f25561c336fed832783fcba887c6acf70e476d999b`
MD5	`e447e549273371fb07b0417f374e6eee`
BLAKE2b-256	`f5d14af55a0ce08091df7b1dc7725b2648f91ca98c8c756fecdc6e0730c2d1b8`

See more details on using hashes here.

File details

Details for the file sovereign_shield-1.1.0-py3-none-any.whl.

File metadata

Download URL: sovereign_shield-1.1.0-py3-none-any.whl
Upload date: Mar 12, 2026
Size: 37.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for sovereign_shield-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3be931159daed989f525f9ffbe856991c909c07a53f12780f1bd12364867ff31`
MD5	`107553279fb3b11108381edc00ece297`
BLAKE2b-256	`8532ecb4615d8106fd9a87662537729a93280db69705d9b968157f5a0562cd0f`

See more details on using hashes here.

sovereign-shield 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sovereign Shield

⚠️ Upgrading to 1.1.0

What changed in 1.0.4 → 1.1.0

What changed in 1.0.3 → 1.0.4

What changed in 1.0.2 → 1.0.3

Architecture

Components

1. CoreSafety — The Immutable Constitution

2. Conscience — The Moral Compass

3. InputFilter — The Sensory Cortex

4. Firewall — The Identity Gateway

5. AdaptiveShield — The Self-Improving Filter (Patent Pending)

Quick Start

Security Properties

File Structure

Tests

License

Origin

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. `CoreSafety` — The Immutable Constitution

2. `Conscience` — The Moral Compass

3. `InputFilter` — The Sensory Cortex

4. `Firewall` — The Identity Gateway

5. `AdaptiveShield` — The Self-Improving Filter (Patent Pending)