Skip to main content

Pluggable multi-layer LLM jailbreak defense pipeline

Project description

aegis-llm

A pluggable, multi-layer LLM jailbreak defense pipeline for Python. Drop it into any codebase — framework-agnostic, provider-agnostic, zero mandatory dependencies.

Install

pip install aegis-llm

Or from source:

git clone https://github.com/your-org/aegis-llm
cd aegis-llm
pip install -e .

Quickstart

from aegis import Decision, Request, SplitPipeline
from aegis.layers import (
    AuditLogger, InputValidation, OutputFilter,
    RateLimiter, ToolAccess,
)

pipeline = SplitPipeline(
    pre=[
        InputValidation(),
        ToolAccess(role_tool_map={
            "admin":  ["query", "create", "delete"],
            "viewer": ["query"],
        }),
        RateLimiter(max_requests=20, window_seconds=60),
    ],
    post=[
        OutputFilter(),
        AuditLogger(sink=print),
    ],
)

request = Request(user_id="u1", message="Show tasks", user_role="viewer")

# Phase 1 — before your LLM call
pre = pipeline.run_pre(request)
if pre.decision == Decision.BLOCK:
    raise Exception(pre.layer_results[-1].block_reason)

# Your LLM call — any provider
allowed_tools = pre.context.get("available_tools", [])
llm_response = your_llm(request.message, tools=allowed_tools)

# Phase 2 — after your LLM call
post = pipeline.run_post(request, llm_response, pre)
final = post.context.get("filtered_response", llm_response)

How it works

Every request flows through a Pipeline — an ordered list of Layers. The pipeline short-circuits on the first BLOCK, so nothing downstream runs on a rejected request.

Request → Layer 1 → Layer 2 → ... → PipelineResult
              ↓
           BLOCK → return immediately

SplitPipeline splits this at the LLM boundary: pre-phase layers run before your model call, post-phase layers run after. Your code owns the LLM invocation; aegis owns the gating on either side.

Built-in layers

Layer Phase What it does
InputValidation pre Regex/keyword blocklist — blocks naive injection attempts
SemanticRouter pre Intent classifier gate — plug in any classifier callable
ToolAccess pre RBAC — filters the tool schema to what the user's role permits
RateLimiter pre Sliding-window quota per user — pluggable backend (Redis, etc.)
OutputFilter post Scans LLM response for internal disclosure, redacts or blocks
AuditLogger post Structured audit record — plug in any sink (CloudWatch, Datadog, etc.)

Writing a custom layer

Subclass Layer and implement one method:

from aegis import Layer
from aegis.models import Decision, LayerResult

class TenantIsolation(Layer):
    def process(self, request, context):
        if request.metadata.get("tenant_id") != context.get("expected_tenant"):
            return LayerResult(
                decision=Decision.BLOCK,
                layer_name=self.name,
                block_reason="tenant_mismatch",
            )
        return LayerResult(decision=Decision.PASS, layer_name=self.name)

Drop it anywhere in the pipeline:

from aegis import Pipeline
from aegis.layers import InputValidation, ToolAccess

Pipeline([InputValidation(), TenantIsolation(), ToolAccess(...)])

SemanticRouter — bring your own classifier

from aegis.layers import SemanticRouter

def my_classifier(message: str) -> tuple[str, float]:
    # call sklearn, HuggingFace, an LLM, anything
    return "task_query", 0.92

SemanticRouter(
    classifier=my_classifier,
    allowed_intents=["task_query", "task_create"],
    confidence_threshold=0.75,
)

RateLimiter — pluggable backend

The default backend is in-memory. For multi-process deployments, implement RateLimitBackend:

from aegis.layers.rate_limiter import RateLimitBackend, RateLimiter

class RedisBackend(RateLimitBackend):
    def get_timestamps(self, user_id): ...
    def record_request(self, user_id, timestamp): ...
    def evict_before(self, user_id, cutoff): ...

RateLimiter(max_requests=10, window_seconds=60, backend=RedisBackend())

AuditLogger — pluggable sink

from aegis.layers import AuditLogger

# CloudWatch, Datadog, S3, database — anything callable
AuditLogger(sink=lambda record: my_logger.info(record))

Running tests

pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aegis_llm-0.1.3.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aegis_llm-0.1.3-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file aegis_llm-0.1.3.tar.gz.

File metadata

  • Download URL: aegis_llm-0.1.3.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aegis_llm-0.1.3.tar.gz
Algorithm Hash digest
SHA256 dcff68254b556e25829744e963b4206c2d348816cea610ab3addacbabb514ff3
MD5 1af53cfa2f97dcb267ec57a9eabdb824
BLAKE2b-256 0adc780b8d0c32721b631bbe6f065820d56d7ea8cd69f5be5e07a5b3fb98d7bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for aegis_llm-0.1.3.tar.gz:

Publisher: publish.yml on arpanptnk85/aegis-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file aegis_llm-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: aegis_llm-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 14.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for aegis_llm-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d49aefb161a59f005f68c9641370ccd646d3440b0c43edc399d92c847b315736
MD5 dc32e9766fb24dc88aa31243b7d4530c
BLAKE2b-256 db5dc81c74afbb1b749fc916b1a0bde6fc6665853eb2611cf028d8196c09488f

See more details on using hashes here.

Provenance

The following attestation bundles were made for aegis_llm-0.1.3-py3-none-any.whl:

Publisher: publish.yml on arpanptnk85/aegis-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page