TrustGuard security integration for AutoGPT agents

These details have not been verified by PyPI

Project links

Project description

AutoGPT TrustGuard Integration

Protect AutoGPT agents from prompt injection, malicious web content, and other AI security threats.

⚠️ Note on AutoGPT Architecture

AutoGPT has moved from a plugin system to a Component-based architecture. External components aren't easily distributable yet, so you may need to:

Copy this package into your AutoGPT installation
Or use the standalone functions/hooks approach

Installation

pip install autogpt-trustguard

Or copy the autogpt_trustguard folder into your AutoGPT project.

Quick Start

Option 1: TrustGuard Component

Use the component for full integration:

from autogpt_trustguard import TrustGuardComponent

# Create the component
trustguard = TrustGuardComponent(
    api_key="ta_xxx...",
    on_threat="block"  # "block", "warn", or "sanitize"
)

# Use in your agent
class MyAgent:
    def __init__(self):
        self.trustguard = trustguard
    
    def browse_web(self, url):
        # Fetch and scan in one step
        safe_content = self.trustguard.fetch_url(url)
        return self.process(safe_content)
    
    def read_file(self, path):
        content = read_file(path)
        # Scan document content
        safe_content = self.trustguard.scan_document(content, filename=path)
        return safe_content

Option 2: Command Registration

from autogpt_trustguard import register_commands

# During agent initialization
register_commands(agent.command_registry, api_key="ta_xxx...")

# Now your agent can use these commands:
# - scan_web_content(content, source_url)
# - scan_document(content, filename)
# - scan_url(url)
# - scan_memory(content, context)

Option 3: Hook-Based Protection

Use hooks to automatically scan all relevant commands:

from autogpt_trustguard import TrustGuardHooks

# Create hooks
hooks = TrustGuardHooks(
    api_key="ta_xxx...",
    on_threat="block",
    scan_web_results=True,
    scan_document_results=True,
    scan_memory_inputs=True,
)

# Register with AutoGPT (method depends on your version)
agent.register_pre_command_hook(hooks.pre_command)
agent.register_post_command_hook(hooks.post_command)

# Now all web browsing, file reading, and memory storage is automatically scanned!

Option 4: Standalone Functions

Use scanning functions directly in your code:

from autogpt_trustguard import scan_url, scan_document, scan_memory

# Fetch and scan a URL
result = scan_url("https://example.com/article")
if result["safe"]:
    content = result["content"]
    process(content)
else:
    print(f"Blocked: {result['threats']}")

# Scan a document
result = scan_document(file_content, filename="report.pdf")
if result["safe"]:
    analyze(file_content)

# Scan before storing in memory
result = scan_memory(user_input, context="User chat message")
if result["safe"]:
    memory.store(user_input)

Component API

TrustGuardComponent

component = TrustGuardComponent(
    api_key="ta_xxx...",        # Your TrustGuard API key
    timeout=30.0,                # Request timeout in seconds
    strict_mode=False,           # True = block on MEDIUM threats
    on_threat="block",           # "block", "warn", or "sanitize"
    enabled=True,                # Toggle scanning on/off
)

# Methods
result = component.scan(content, source_type="web")
safe_content = component.scan_or_raise(content, source_type="document")
safe_content = component.scan_web(content, source_url="...")
safe_content = component.scan_document(content, filename="...")
safe_content = component.fetch_url(url)
safe_content = component.scan_memory_content(content)
is_safe = component.is_safe(content)
stats = component.get_stats()

TrustGuardHooks

hooks = TrustGuardHooks(
    api_key="ta_xxx...",
    on_threat="block",           # "block", "warn", "sanitize"
    scan_web_results=True,       # Scan web command results
    scan_document_results=True,  # Scan file command results
    scan_memory_inputs=True,     # Scan before memory storage
    strict_mode=False,           # Block on MEDIUM threats
)

# Hook methods (register with AutoGPT)
hooks.pre_command(command_name, arguments)  # Returns (name, args)
hooks.post_command(command_name, result)    # Returns result

Commands Automatically Scanned

When using hooks, these command types are automatically protected:

Web Commands (results scanned):

browse_website, browse_web
fetch_url, scrape_website
google_search, search_web

Document Commands (results scanned):

read_file, read_document
analyze_code, list_files

Memory Commands (inputs scanned):

add_memory, store_memory
save_memory, update_memory

Threat Types Detected

TrustGuard detects multiple threat categories:

Prompt Injection: Hidden instructions in web pages or documents
Jailbreak Attempts: Attempts to bypass agent restrictions
Data Exfiltration: Patterns designed to leak sensitive data
Memory Poisoning: Malicious content targeting agent memory
RAG Poisoning: Content designed to corrupt vector stores
Tool Description Poisoning: Malicious tool descriptions
Identity Manipulation: Attempts to override agent identity

Configuration via Environment Variables

You can set the API key via environment variable:

export TRUSTGUARD_API_KEY=ta_xxx...

Then omit the api_key parameter:

component = TrustGuardComponent()  # Uses env var

Error Handling

from autogpt_trustguard import TrustGuardComponent
from autogpt_trustguard.component import ThreatDetectedError

component = TrustGuardComponent(api_key="...", on_threat="block")

try:
    content = component.fetch_url("https://malicious-site.com")
except ThreatDetectedError as e:
    print(f"Blocked: {e.reasoning}")
    print(f"Threats: {e.threats}")
    print(f"Severity: {e.threat_level}")

Integration with AutoGPT Forge

If using AutoGPT Forge to build custom agents:

from forge.agent import Agent
from autogpt_trustguard import TrustGuardComponent

class SecureAgent(Agent):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        self.trustguard = TrustGuardComponent(
            api_key="ta_xxx...",
            on_threat="block"
        )
    
    async def execute_step(self, task, step):
        # Your logic here, using self.trustguard for protection
        ...

Statistics and Monitoring

Track scanning activity:

stats = component.get_stats()
print(f"Total scans: {stats['scans_total']}")
print(f"Safe content: {stats['scans_safe']}")
print(f"Blocked threats: {stats['scans_blocked']}")
print(f"Errors: {stats['scans_errored']}")

# Get recent threats
threats = component.get_recent_threats(limit=10)
for threat in threats:
    print(f"{threat['source_type']}: {threat['threats']}")

Support

Documentation: https://trustagents.dev/docs
Issues: https://github.com/trustagents/autogpt-trustguard/issues
Discord: https://discord.gg/trustagents

License

MIT License - see LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autogpt_trustguard-0.1.0.tar.gz (13.8 kB view details)

Uploaded Feb 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

autogpt_trustguard-0.1.0-py3-none-any.whl (13.9 kB view details)

Uploaded Feb 7, 2026 Python 3

File details

Details for the file autogpt_trustguard-0.1.0.tar.gz.

File metadata

Download URL: autogpt_trustguard-0.1.0.tar.gz
Upload date: Feb 7, 2026
Size: 13.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for autogpt_trustguard-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`22829edfedf3c9314513ebf53dcaf31df0a43c29eaa5d5d50b11b7bd78c55258`
MD5	`e515f087221693e7748b8d1cf77ea00f`
BLAKE2b-256	`e6f806773f8c3a9f2391711bfebe20af81b05822305b2d58a445b5085f599050`

See more details on using hashes here.

File details

Details for the file autogpt_trustguard-0.1.0-py3-none-any.whl.

File metadata

Download URL: autogpt_trustguard-0.1.0-py3-none-any.whl
Upload date: Feb 7, 2026
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for autogpt_trustguard-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56849daedad18bd3375f3a4e89aa4553db6971f90b97115cb8fdd93dbbc6613d`
MD5	`0bc7008ab9e19173be7a3121a45666fb`
BLAKE2b-256	`6cc1d63b758689780b6e0b7fcbd019680be9b003200a0da4884720468ff9670d`

See more details on using hashes here.

autogpt-trustguard 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AutoGPT TrustGuard Integration

⚠️ Note on AutoGPT Architecture

Installation

Quick Start

Option 1: TrustGuard Component

Option 2: Command Registration

Option 3: Hook-Based Protection

Option 4: Standalone Functions

Component API

TrustGuardComponent

TrustGuardHooks

Commands Automatically Scanned

Threat Types Detected

Configuration via Environment Variables

Error Handling

Integration with AutoGPT Forge

Statistics and Monitoring

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes