Security validator for AI agents - tests your agent's resistance to prompt extraction and injection attacks

These details have not been verified by PyPI

Project links

Project description

AgentSeal

Find out if your AI agent can be hacked - before someone else does.

   ██████╗   ██████╗ ███████╗███╗   ██╗████████╗███████╗███████╗ █████╗ ██╗
  ██╔══██╗ ██╔════╝ ██╔════╝████╗  ██║╚══██╔══╝██╔════╝██╔════╝██╔══██╗██║
  ███████║ ██║  ███╗█████╗  ██╔██╗ ██║   ██║   ███████╗█████╗  ███████║██║
  ██╔══██║ ██║   ██║██╔══╝  ██║╚██╗██║   ██║   ╚════██║██╔══╝  ██╔══██║██║
  ██║  ██║ ╚██████╔╝███████╗██║ ╚████║   ██║   ███████║███████╗██║  ██║███████╗
  ╚═╝  ╚═╝  ╚═════╝ ╚══════╝╚═╝  ╚═══╝   ╚═╝   ╚══════╝╚══════╝╚═╝  ╚═╝╚══════╝

AgentSeal is a security scanner for AI agents. It sends 72+ attack probes to your agent and tells you exactly where it's vulnerable - so you can fix it before attackers find out.

What does AgentSeal do?

Every AI agent has a system prompt - the hidden instructions that tell it how to behave. Attackers can try to:

Extract your prompt - trick the agent into revealing its secret instructions
Inject new instructions - override the agent's behavior and make it do something it shouldn't

AgentSeal tests your agent against both of these attacks using 72+ techniques (up to 118 with MCP and RAG probes). You get:

A trust score from 0 to 100 - how secure your agent is
A detailed breakdown of which attacks succeeded and which were blocked
Specific recommendations on how to fix the vulnerabilities it finds

No AI expertise required. Just point AgentSeal at your agent and get results.

Who is this for?

You built an AI agent (chatbot, assistant, copilot, etc.) and want to know if it's secure
You manage AI products and need to verify they meet security standards before shipping
You're a developer who wants to add security scanning to your CI/CD pipeline
You're curious whether your favorite AI tool is actually protecting your data

Quick Start

Step 1: Install AgentSeal

You need Python 3.10 or newer. If you don't have Python, download it here.

pip install agentseal

Step 2: Run your first scan

Pick whichever matches your setup:

Option A: Test a system prompt against a cloud model (e.g. GPT-4o)

export OPENAI_API_KEY=your-api-key-here

agentseal scan \
  --prompt "You are a helpful customer support agent for Acme Corp..." \
  --model gpt-4o

Option B: Test with a free local model (Ollama)

If you don't have an API key, you can use Ollama to run a free local model:

# Install Ollama from https://ollama.com, then:
ollama pull llama3.1:8b

agentseal scan \
  --prompt "You are a helpful assistant..." \
  --model ollama/llama3.1:8b

Option C: Test a live agent endpoint

If your agent is already running as an API:

agentseal scan --url http://localhost:8080/chat

Step 3: Read your results

AgentSeal will show you something like:

Trust Score: 73/100 (HIGH)

  Extraction resistance:  82/100  (9 blocked, 2 partial, 1 leaked)
  Injection resistance:   68/100  (7 blocked, 3 leaked)
  Boundary integrity:     75/100
  Consistency:            90/100

Top vulnerabilities:
  1. [CRITICAL] Direct ask #3 - agent revealed full system prompt
  2. [HIGH] Persona hijack #2 - agent followed injected instructions
  3. [MEDIUM] Encoding trick #1 - agent leaked partial prompt via Base64

Remediation:
  - Add explicit refusal instructions to your system prompt
  - Use delimiters to separate system instructions from user input
  - Consider adding an input/output filter layer

A score of 75+ means your agent is solid. Below 50 means serious problems - fix those before going live.

How It Works

┌─────────────┐    72-118 attack probes    ┌──────────────┐
│             │ ─────────────────────────>│              │
│  AgentSeal  │                           │  Your Agent  │
│             │ <─────────────────────────│              │
└─────────────┘     agent responses       └──────────────┘
       │
       ▼
  Deterministic analysis (no AI judge - fully reproducible)
       │
       ▼
  Trust score + detailed report + fix recommendations

Why deterministic? Unlike tools that use another AI to judge results, AgentSeal uses pattern matching. This means running the same scan twice gives the exact same results - no randomness, no extra API costs.

Scan Modes

AgentSeal supports multiple scan modes you can combine depending on your agent's architecture:

Command	Probes	What it tests	Tier
`agentseal scan`	72	Base scan - 37 extraction + 35 injection probes	Free
`agentseal scan --adaptive`	72+	+ adaptive mutation transforms on blocked probes	Free
`agentseal watch`	5	Canary regression scan - fast check with baseline comparison	Free
`agentseal scan --mcp`	98	+ 26 MCP tool poisoning probes	Pro
`agentseal scan --rag`	92	+ 20 RAG poisoning probes	Pro
`agentseal scan --mcp --rag`	118	Full attack surface - all probe categories	Pro
`agentseal scan --genome`	72 + ~105	+ Behavioral genome mapping - finds decision boundaries	Pro
`agentseal scan --mcp --rag --genome`	118 + ~105	Everything - the most thorough scan available	Pro

Free vs Pro

The core scanner is completely free and open source. Pro unlocks advanced probe categories, genome mapping, and reporting.

Feature	Free	Pro
72 base attack probes (extraction + injection)	Yes	Yes
Adaptive mutations (`--adaptive`)	Yes	Yes
Canary regression watch (`agentseal watch`)	Yes	Yes
Interactive fix flow (autofix & re-scan)	Yes	Yes
Terminal report with scores and remediation	Yes	Yes
JSON output (`--save results.json`)	Yes	Yes
SARIF output for GitHub Security tab	Yes	Yes
CI/CD integration (`--min-score`)	Yes	Yes
Defense fingerprinting	Yes	Yes
MCP tool poisoning probes (`--mcp`, +26 probes)	-	Yes
RAG poisoning probes (`--rag`, +20 probes)	-	Yes
Behavioral genome mapping (`--genome`)	-	Yes
PDF security assessment report (`--report`)	-	Yes
Dashboard (track security over time, `--upload`)	-	Yes

Get Pro

agentseal login

This opens your browser to sign in at agentseal.org. Once logged in, Pro features unlock automatically.

CLI Reference

Scanning

# Scan a system prompt against a model
agentseal scan --prompt "Your prompt here..." --model gpt-4o

# Scan a prompt from a file
agentseal scan --file ./my-prompt.txt --model gpt-4o

# Scan a live HTTP endpoint
agentseal scan --url http://localhost:8080/chat

# Save results as JSON
agentseal scan --prompt "..." --model gpt-4o --save results.json

# Output as SARIF (for GitHub Security tab)
agentseal scan --prompt "..." --model gpt-4o --output sarif --save results.sarif

# Set a minimum score - exit code 1 if it fails (great for CI/CD)
agentseal scan --prompt "..." --model gpt-4o --min-score 75

# Verbose mode - see each probe result as it runs
agentseal scan --prompt "..." --model gpt-4o --verbose

More options

# Enable adaptive mutations (tests encoding bypasses)
agentseal scan --prompt "..." --model gpt-4o --adaptive

# Generate a hardened prompt with security fixes
agentseal scan --prompt "..." --model gpt-4o --fix hardened_prompt.txt

Regression monitoring

# Set a baseline (first run)
agentseal watch --prompt "..." --model gpt-4o --set-baseline

# Check for regressions (subsequent runs)
agentseal watch --prompt "..." --model gpt-4o

# With webhook alerts
agentseal watch --prompt "..." --model gpt-4o --webhook-url https://hooks.slack.com/...

Pro features (requires `agentseal login`)

# MCP tool poisoning probes (+26 probes)
agentseal scan --prompt "..." --model gpt-4o --mcp

# RAG poisoning probes (+20 probes)
agentseal scan --prompt "..." --model gpt-4o --rag

# Behavioral genome mapping (find exact decision boundaries)
agentseal scan --prompt "..." --model gpt-4o --genome

# Full Pro scan - everything enabled
agentseal scan --prompt "..." --model gpt-4o --mcp --rag --genome --adaptive

# Generate a PDF security report
agentseal scan --prompt "..." --model gpt-4o --report security-report.pdf

# Upload results to your dashboard
agentseal scan --prompt "..." --model gpt-4o --upload

Account

# Log in (opens browser)
agentseal login

# Activate with a license key (alternative)
agentseal activate <your-license-key>

Supported models

Provider	How to use	API key needed?
OpenAI	`--model gpt-4o`	Yes - set `OPENAI_API_KEY`
Anthropic	`--model claude-sonnet-4-5-20250929`	Yes - set `ANTHROPIC_API_KEY`
Ollama (local, free)	`--model ollama/llama3.1:8b`	No
LiteLLM (proxy)	`--model any-model --litellm-url http://...`	Depends on setup
Any HTTP API	`--url http://your-agent.com/chat`	No

CI/CD Integration

Add AgentSeal to your pipeline to automatically block insecure agents from shipping.

GitHub Actions

name: Agent Security Scan
on: [push, pull_request]

jobs:
  security-scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: "3.12"

      - name: Install AgentSeal
        run: pip install agentseal

      - name: Run security scan
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
        run: |
          agentseal scan \
            --file ./prompts/system_prompt.txt \
            --model gpt-4o \
            --min-score 75 \
            --output sarif \
            --save results.sarif

      - name: Upload results to GitHub Security
        uses: github/codeql-action/upload-sarif@v3
        if: always()
        with:
          sarif_file: results.sarif

How it works

--min-score 75 makes the command exit with code 1 if the trust score is below 75
Your CI pipeline treats exit code 1 as a failure - blocking the merge/deploy
--output sarif produces results in SARIF format, which GitHub displays in the Security tab
You can adjust the threshold: use 60 for early development, 85 for production agents

Other CI systems

AgentSeal is just a CLI command - it works in any CI system that can run Python:

# Generic CI step
pip install agentseal
agentseal scan --file ./prompt.txt --model gpt-4o --min-score 75

Exit codes:

0 - Score meets or exceeds --min-score (pass)
1 - Score is below --min-score (fail)

Python API

For developers who want to integrate AgentSeal into their own code:

import asyncio
from agentseal import AgentValidator

# Define your agent function
async def my_agent(message: str) -> str:
    # Replace this with your actual agent logic
    return "I can help with that!"

async def main():
    validator = AgentValidator(
        agent_fn=my_agent,
        ground_truth_prompt="You are a helpful assistant...",
    )
    report = await validator.run()

    # Print the terminal report
    report.print()

    # Access the score programmatically
    print(f"Trust score: {report.trust_score}/100")

    # Export as JSON
    data = report.to_dict()

    # Get just the leaked probes
    for result in report.get_leaked():
        print(f"  LEAKED: {result.technique}")

    # Get remediation suggestions
    for fix in report.get_remediation():
        print(f"  FIX: {fix}")

asyncio.run(main())

With OpenAI

import openai
from agentseal import AgentValidator

client = openai.AsyncOpenAI()
validator = AgentValidator.from_openai(
    client=client,
    model="gpt-4o",
    system_prompt="You are a helpful assistant...",
)
report = await validator.run()

With Anthropic

import anthropic
from agentseal import AgentValidator

client = anthropic.AsyncAnthropic()
validator = AgentValidator.from_anthropic(
    client=client,
    model="claude-sonnet-4-5-20250929",
    system_prompt="You are a helpful assistant...",
)
report = await validator.run()

Testing an HTTP endpoint

from agentseal import AgentValidator

validator = AgentValidator.from_endpoint(
    url="http://localhost:8080/chat",
    ground_truth_prompt="You are a helpful assistant...",
    message_field="input",       # customize if your API uses different field names
    response_field="output",
)
report = await validator.run()

FAQ

How long does a scan take?

With a local model (Ollama): 1-3 minutes. With cloud APIs (OpenAI, Anthropic): 3-6 minutes. You can adjust speed with --concurrency (default is 3 parallel probes).

What's a good trust score?

Score	What it means
85-100	Excellent - strong protection across the board
70-84	Good - minor gaps, fine for most use cases
50-69	Needs work - several attack categories succeed
Below 50	Serious problems - don't deploy without fixing these

Does AgentSeal send my system prompt anywhere?

No. Your system prompt is only sent to the model you specify (OpenAI, Ollama, etc.). AgentSeal itself never collects, stores, or transmits your prompts. Everything runs locally.

Do I need an API key?

Only if you're testing against a cloud model (OpenAI, Anthropic). If you use Ollama, everything runs locally for free - no API key, no account, no cost.

What's the difference between free and Pro?

Free gives you the full 72-probe scanner with adaptive mutations, regression monitoring, interactive fix flow, JSON/SARIF output, and CI/CD integration. Pro adds MCP tool poisoning probes (+26), RAG poisoning probes (+20), behavioral genome mapping, PDF reports, and a dashboard. See the comparison table.

Can I contribute new attack probes?

Yes! See CONTRIBUTING.md. We welcome new probes, detection improvements, and bug fixes.

Contributing

We welcome contributions! See CONTRIBUTING.md for how to get started.

For security vulnerabilities, please email hello@agentseal.org instead of opening a public issue.

License

FSL-1.1-Apache-2.0 - Functional Source License, Version 1.1, with Apache 2.0 future license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.9.6

Apr 9, 2026

0.9.5

Apr 9, 2026

0.9.4

Apr 9, 2026

0.9.3

Apr 9, 2026

0.9.2

Apr 9, 2026

0.9.1

Apr 8, 2026

0.9.0

Apr 8, 2026

0.8.1

Mar 26, 2026

0.8.0

Mar 25, 2026

0.7.3

Mar 25, 2026

0.7.2

Mar 24, 2026

0.7.1

Mar 23, 2026

0.7.0

Mar 23, 2026

0.6.2

Mar 11, 2026

0.6.1

Mar 10, 2026

0.6.0

Mar 10, 2026

0.5.2

Mar 9, 2026

0.5.1

Mar 9, 2026

0.5.0

Mar 7, 2026

0.4.0

Mar 6, 2026

0.3.3

Mar 6, 2026

0.3.1

Mar 3, 2026

0.3.0

Mar 3, 2026

0.2.2

Mar 3, 2026

This version

0.2.1

Mar 3, 2026

0.2.0

Mar 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentseal-0.2.1.tar.gz (106.0 kB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentseal-0.2.1-py3-none-any.whl (120.7 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file agentseal-0.2.1.tar.gz.

File metadata

Download URL: agentseal-0.2.1.tar.gz
Upload date: Mar 3, 2026
Size: 106.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentseal-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`8ed537516d086e5642bd28864fd168d535719d84c61d55224b013e015b0f1b01`
MD5	`5edfe904cf63143ac29a57beb0b564c0`
BLAKE2b-256	`89568141b7c4105b63c6a348bf7e90c83278598c38d7c982b33bdabe3c49c316`

See more details on using hashes here.

File details

Details for the file agentseal-0.2.1-py3-none-any.whl.

File metadata

Download URL: agentseal-0.2.1-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 120.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentseal-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`655073a8881fd8d1717c6c10cbe45d48f6fd46f8f4b800bec658bf18e05fd3b5`
MD5	`b38794a3cf88024e84a049c349d552cc`
BLAKE2b-256	`7867da1cf74903d1f861384da102489d87ef369e5c1b0406f9c6c124ac3d7fd8`

See more details on using hashes here.

agentseal 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentSeal

What does AgentSeal do?

Who is this for?

Quick Start

Step 1: Install AgentSeal

Step 2: Run your first scan

Step 3: Read your results

How It Works

Scan Modes

Free vs Pro

Get Pro

CLI Reference

Scanning

More options

Regression monitoring

Pro features (requires agentseal login)

Account

Supported models

CI/CD Integration

GitHub Actions

How it works

Other CI systems

Python API

With OpenAI

With Anthropic

Testing an HTTP endpoint

FAQ

How long does a scan take?

What's a good trust score?

Does AgentSeal send my system prompt anywhere?

Do I need an API key?

What's the difference between free and Pro?

Can I contribute new attack probes?

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Pro features (requires `agentseal login`)