watchllm

AI agent reliability testing platform

These details have not been verified by PyPI

Project links

Project description

WatchLLM Banner

AI agent reliability testing infrastructure. Stress test your agents against 8 attack categories—prompt injection, tool abuse, jailbreaks, data exfiltration, and more—before they hit production.

Observability, evaluation, and adversarial testing platform for LLM agents.
Ship reliable, secure, and debuggable agentic systems.

Website • Documentation

Installation

Install the official Python SDK via pip:

pip install watchllm

Authentication

WatchLLM requires an API key to securely log your simulations to your dashboard.

Option A: Interactive CLI Login (Recommended for Local Dev)

watchllm auth login

This will prompt you for your API key and securely save it to ~/.watchllm/config.

Option B: Environment Variables (Recommended for CI/CD) Set the following environment variable in your pipeline or .env file:

export WATCHLLM_API_KEY="your_api_key_here"

Running Simulations via CLI

The fastest way to test an agent is using the WatchLLM CLI.

# Run a specific attack category
watchllm simulate --agent my_module.my_agent --categories prompt_injection

# Run multiple categories simultaneously
watchllm simulate --agent my_module.my_agent --categories prompt_injection,tool_abuse,hallucination

# Run all 8 attack categories
watchllm simulate --agent my_module.my_agent --categories all

Note: Replace my_module.my_agent with the Python import path to your agent function. Your agent function must accept a string and return a string.

Native Python Integration (CI/CD)

For automated testing inside your test suite or CI/CD pipelines, wrap your agent with the @watchllm.test decorator. This exposes your agent's signature to the WatchLLM runner.

import watchllm

@watchllm.test(
    categories=["prompt_injection", "data_exfiltration"],
    threshold=0.3,   # Raises WatchLLMThresholdError if severity >= 0.3
)
def my_agent(user_input: str) -> str:
    # Your agent logic here
    return response

If the agent exhibits a vulnerability during the test, the SDK will automatically fail the pipeline and exit with a non-zero status code.

Analyzing Results

Once a simulation completes, the CLI will output the terminal state:

Simulation ID: sim_abc123
Status: completed
Severity: 0.90
Verdict: Critical vulnerability detected

To view the complete execution graph, scrub through the node lineage, and replay specific forks, visit the dashboard: https://dashboard.watchllm.dev

Command Reference

Command	Description
`watchllm doctor`	Verifies your setup, API key validity, and agent reachability.
`watchllm simulate`	Launches a new attack simulation.
`watchllm replay`	Prints a text-based tree of the execution graph directly in your terminal.
`watchllm status`	Returns the current completion percentage and live severity scores of a running simulation.

Requirements

Python 3.9 or higher
The requests library (installed automatically)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Jun 28, 2026

0.1.2

Jun 28, 2026

This version

0.1.1

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

watchllm-0.1.1.tar.gz (13.4 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

watchllm-0.1.1-py3-none-any.whl (13.3 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file watchllm-0.1.1.tar.gz.

File metadata

Download URL: watchllm-0.1.1.tar.gz
Upload date: May 12, 2026
Size: 13.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for watchllm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e5e1fb6b69ad88d1b355a61b909f0eac77320bc5a44f2b7586556622d43dc541`
MD5	`a07509e5567fad459f897e201cd14f25`
BLAKE2b-256	`c8fa46f2000749e480a1dc7c6ede509d1ea8257fa5fd9b91f3922b631cb2a8cf`

See more details on using hashes here.

File details

Details for the file watchllm-0.1.1-py3-none-any.whl.

File metadata

Download URL: watchllm-0.1.1-py3-none-any.whl
Upload date: May 12, 2026
Size: 13.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for watchllm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d1e0cade54561d10eac5e8aada78a59732a46abf61cd9960283a7243b51ad63a`
MD5	`997d4729ea907a4a471512a1e13f17eb`
BLAKE2b-256	`573e52466efe5e1298c213a86cc24f1105d1e83707684aad32c4428f02619dd8`

See more details on using hashes here.

watchllm 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI agent reliability testing infrastructure. Stress test your agents against 8 attack categories—prompt injection, tool abuse, jailbreaks, data exfiltration, and more—before they hit production.

Installation

Authentication

Running Simulations via CLI

Native Python Integration (CI/CD)

Analyzing Results

Command Reference

Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes