Skip to main content

Pluggable multi-layer LLM jailbreak defense pipeline

Project description

aegis-llm

A pluggable, multi-layer LLM jailbreak defense pipeline for Python. Drop it into any codebase — framework-agnostic, provider-agnostic, zero mandatory dependencies.

Install

pip install aegis-llm

Or from source:

git clone https://github.com/your-org/aegis-llm
cd aegis-llm
pip install -e .

Quickstart

from aegis import Decision, Request, SplitPipeline
from aegis.layers import (
    AuditLogger, InputValidation, OutputFilter,
    RateLimiter, ToolAccess,
)

pipeline = SplitPipeline(
    pre=[
        InputValidation(),
        ToolAccess(role_tool_map={
            "admin":  ["query", "create", "delete"],
            "viewer": ["query"],
        }),
        RateLimiter(max_requests=20, window_seconds=60),
    ],
    post=[
        OutputFilter(),
        AuditLogger(sink=print),
    ],
)

request = Request(user_id="u1", message="Show tasks", user_role="viewer")

# Phase 1 — before your LLM call
pre = pipeline.run_pre(request)
if pre.decision == Decision.BLOCK:
    raise Exception(pre.layer_results[-1].block_reason)

# Your LLM call — any provider
allowed_tools = pre.context.get("available_tools", [])
llm_response = your_llm(request.message, tools=allowed_tools)

# Phase 2 — after your LLM call
post = pipeline.run_post(request, llm_response, pre)
final = post.context.get("filtered_response", llm_response)

How it works

Every request flows through a Pipeline — an ordered list of Layers. The pipeline short-circuits on the first BLOCK, so nothing downstream runs on a rejected request.

Request → Layer 1 → Layer 2 → ... → PipelineResult
              ↓
           BLOCK → return immediately

SplitPipeline splits this at the LLM boundary: pre-phase layers run before your model call, post-phase layers run after. Your code owns the LLM invocation; aegis owns the gating on either side.

Built-in layers

Layer Phase What it does
InputValidation pre Regex/keyword blocklist — blocks naive injection attempts
SemanticRouter pre Intent classifier gate — plug in any classifier callable
ToolAccess pre RBAC — filters the tool schema to what the user's role permits
RateLimiter pre Sliding-window quota per user — pluggable backend (Redis, etc.)
OutputFilter post Scans LLM response for internal disclosure, redacts or blocks
AuditLogger post Structured audit record — plug in any sink (CloudWatch, Datadog, etc.)

Writing a custom layer

Subclass Layer and implement one method:

from aegis import Layer
from aegis.models import Decision, LayerResult

class TenantIsolation(Layer):
    def process(self, request, context):
        if request.metadata.get("tenant_id") != context.get("expected_tenant"):
            return LayerResult(
                decision=Decision.BLOCK,
                layer_name=self.name,
                block_reason="tenant_mismatch",
            )
        return LayerResult(decision=Decision.PASS, layer_name=self.name)

Drop it anywhere in the pipeline:

from aegis import Pipeline
from aegis.layers import InputValidation, ToolAccess

Pipeline([InputValidation(), TenantIsolation(), ToolAccess(...)])

SemanticRouter — bring your own classifier

from aegis.layers import SemanticRouter

def my_classifier(message: str) -> tuple[str, float]:
    # call sklearn, HuggingFace, an LLM, anything
    return "task_query", 0.92

SemanticRouter(
    classifier=my_classifier,
    allowed_intents=["task_query", "task_create"],
    confidence_threshold=0.75,
)

RateLimiter — pluggable backend

The default backend is in-memory. For multi-process deployments, implement RateLimitBackend:

from aegis.layers.rate_limiter import RateLimitBackend, RateLimiter

class RedisBackend(RateLimitBackend):
    def get_timestamps(self, user_id): ...
    def record_request(self, user_id, timestamp): ...
    def evict_before(self, user_id, cutoff): ...

RateLimiter(max_requests=10, window_seconds=60, backend=RedisBackend())

AuditLogger — pluggable sink

from aegis.layers import AuditLogger

# CloudWatch, Datadog, S3, database — anything callable
AuditLogger(sink=lambda record: my_logger.info(record))

Running tests

pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aegis_llm-0.1.1.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aegis_llm-0.1.1-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file aegis_llm-0.1.1.tar.gz.

File metadata

  • Download URL: aegis_llm-0.1.1.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for aegis_llm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1e74fc494c6b1442b22aad35ab2745ba9814d97948509789f9927a23b971621a
MD5 c8877acf2095c4e4872595a38ce48c85
BLAKE2b-256 4da72906f685624f7ea2d76a2e137a7a95c1307302c9d9cfa20a42ae6321be5f

See more details on using hashes here.

File details

Details for the file aegis_llm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aegis_llm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for aegis_llm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f2c8ad77fd5b8f0bfca1306efabb72a4e3eca81251e741516d4aec9093497f4e
MD5 1822771aee674c026e7faa87d1958641
BLAKE2b-256 592edca8c32f65d880e70dc94e07b6be2aee63ebd28448542433d1fa679a5084

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page