Skip to main content

TrustGuard security integration for AutoGPT agents

Project description

AutoGPT TrustGuard Integration

Protect AutoGPT agents from prompt injection, malicious web content, and other AI security threats.

⚠️ Note on AutoGPT Architecture

AutoGPT has moved from a plugin system to a Component-based architecture. External components aren't easily distributable yet, so you may need to:

  1. Copy this package into your AutoGPT installation
  2. Or use the standalone functions/hooks approach

Installation

pip install autogpt-trustguard

Or copy the autogpt_trustguard folder into your AutoGPT project.

Quick Start

Option 1: TrustGuard Component

Use the component for full integration:

from autogpt_trustguard import TrustGuardComponent

# Create the component
trustguard = TrustGuardComponent(
    api_key="ta_xxx...",
    on_threat="block"  # "block", "warn", or "sanitize"
)

# Use in your agent
class MyAgent:
    def __init__(self):
        self.trustguard = trustguard
    
    def browse_web(self, url):
        # Fetch and scan in one step
        safe_content = self.trustguard.fetch_url(url)
        return self.process(safe_content)
    
    def read_file(self, path):
        content = read_file(path)
        # Scan document content
        safe_content = self.trustguard.scan_document(content, filename=path)
        return safe_content

Option 2: Command Registration

Register TrustGuard commands with AutoGPT:

from autogpt_trustguard import register_commands

# During agent initialization
register_commands(agent.command_registry, api_key="ta_xxx...")

# Now your agent can use these commands:
# - scan_web_content(content, source_url)
# - scan_document(content, filename)
# - scan_url(url)
# - scan_memory(content, context)

Option 3: Hook-Based Protection

Use hooks to automatically scan all relevant commands:

from autogpt_trustguard import TrustGuardHooks

# Create hooks
hooks = TrustGuardHooks(
    api_key="ta_xxx...",
    on_threat="block",
    scan_web_results=True,
    scan_document_results=True,
    scan_memory_inputs=True,
)

# Register with AutoGPT (method depends on your version)
agent.register_pre_command_hook(hooks.pre_command)
agent.register_post_command_hook(hooks.post_command)

# Now all web browsing, file reading, and memory storage is automatically scanned!

Option 4: Standalone Functions

Use scanning functions directly in your code:

from autogpt_trustguard import scan_url, scan_document, scan_memory

# Fetch and scan a URL
result = scan_url("https://example.com/article")
if result["safe"]:
    content = result["content"]
    process(content)
else:
    print(f"Blocked: {result['threats']}")

# Scan a document
result = scan_document(file_content, filename="report.pdf")
if result["safe"]:
    analyze(file_content)

# Scan before storing in memory
result = scan_memory(user_input, context="User chat message")
if result["safe"]:
    memory.store(user_input)

Component API

TrustGuardComponent

component = TrustGuardComponent(
    api_key="ta_xxx...",        # Your TrustGuard API key
    timeout=30.0,                # Request timeout in seconds
    strict_mode=False,           # True = block on MEDIUM threats
    on_threat="block",           # "block", "warn", or "sanitize"
    enabled=True,                # Toggle scanning on/off
)

# Methods
result = component.scan(content, source_type="web")
safe_content = component.scan_or_raise(content, source_type="document")
safe_content = component.scan_web(content, source_url="...")
safe_content = component.scan_document(content, filename="...")
safe_content = component.fetch_url(url)
safe_content = component.scan_memory_content(content)
is_safe = component.is_safe(content)
stats = component.get_stats()

TrustGuardHooks

hooks = TrustGuardHooks(
    api_key="ta_xxx...",
    on_threat="block",           # "block", "warn", "sanitize"
    scan_web_results=True,       # Scan web command results
    scan_document_results=True,  # Scan file command results
    scan_memory_inputs=True,     # Scan before memory storage
    strict_mode=False,           # Block on MEDIUM threats
)

# Hook methods (register with AutoGPT)
hooks.pre_command(command_name, arguments)  # Returns (name, args)
hooks.post_command(command_name, result)    # Returns result

Commands Automatically Scanned

When using hooks, these command types are automatically protected:

Web Commands (results scanned):

  • browse_website, browse_web
  • fetch_url, scrape_website
  • google_search, search_web

Document Commands (results scanned):

  • read_file, read_document
  • analyze_code, list_files

Memory Commands (inputs scanned):

  • add_memory, store_memory
  • save_memory, update_memory

Threat Types Detected

TrustGuard detects multiple threat categories:

  • Prompt Injection: Hidden instructions in web pages or documents
  • Jailbreak Attempts: Attempts to bypass agent restrictions
  • Data Exfiltration: Patterns designed to leak sensitive data
  • Memory Poisoning: Malicious content targeting agent memory
  • RAG Poisoning: Content designed to corrupt vector stores
  • Tool Description Poisoning: Malicious tool descriptions
  • Identity Manipulation: Attempts to override agent identity

Configuration via Environment Variables

You can set the API key via environment variable:

export TRUSTGUARD_API_KEY=ta_xxx...

Then omit the api_key parameter:

component = TrustGuardComponent()  # Uses env var

Error Handling

from autogpt_trustguard import TrustGuardComponent
from autogpt_trustguard.component import ThreatDetectedError

component = TrustGuardComponent(api_key="...", on_threat="block")

try:
    content = component.fetch_url("https://malicious-site.com")
except ThreatDetectedError as e:
    print(f"Blocked: {e.reasoning}")
    print(f"Threats: {e.threats}")
    print(f"Severity: {e.threat_level}")

Integration with AutoGPT Forge

If using AutoGPT Forge to build custom agents:

from forge.agent import Agent
from autogpt_trustguard import TrustGuardComponent

class SecureAgent(Agent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.trustguard = TrustGuardComponent(
            api_key="ta_xxx...",
            on_threat="block"
        )
    
    async def execute_step(self, task, step):
        # Your logic here, using self.trustguard for protection
        ...

Statistics and Monitoring

Track scanning activity:

stats = component.get_stats()
print(f"Total scans: {stats['scans_total']}")
print(f"Safe content: {stats['scans_safe']}")
print(f"Blocked threats: {stats['scans_blocked']}")
print(f"Errors: {stats['scans_errored']}")

# Get recent threats
threats = component.get_recent_threats(limit=10)
for threat in threats:
    print(f"{threat['source_type']}: {threat['threats']}")

Support

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogpt_trustguard-0.1.0.tar.gz (13.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

autogpt_trustguard-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file autogpt_trustguard-0.1.0.tar.gz.

File metadata

  • Download URL: autogpt_trustguard-0.1.0.tar.gz
  • Upload date:
  • Size: 13.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for autogpt_trustguard-0.1.0.tar.gz
Algorithm Hash digest
SHA256 22829edfedf3c9314513ebf53dcaf31df0a43c29eaa5d5d50b11b7bd78c55258
MD5 e515f087221693e7748b8d1cf77ea00f
BLAKE2b-256 e6f806773f8c3a9f2391711bfebe20af81b05822305b2d58a445b5085f599050

See more details on using hashes here.

File details

Details for the file autogpt_trustguard-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for autogpt_trustguard-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 56849daedad18bd3375f3a4e89aa4553db6971f90b97115cb8fdd93dbbc6613d
MD5 0bc7008ab9e19173be7a3121a45666fb
BLAKE2b-256 6cc1d63b758689780b6e0b7fcbd019680be9b003200a0da4884720468ff9670d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page