Rogue agent evaluator by Qualifire

Project description

Rogue - The AI Agent Evaluator

Tests

Rogue is a powerful tool designed to evaluate the performance, compliance, and reliability of AI agents. It pits a dynamic EvaluatorAgent against your agent using Google's A2A protocol, testing it with a range of scenarios to ensure it behaves exactly as intended.

https://github.com/user-attachments/assets/b5c04772-6916-4aab-825b-6a7476d77787

🔥 Quick Start

Prerequisites

uvx - If not installed, follow uv installation guide
Python 3.10+
An API key for an LLM provider (e.g., OpenAI, Google, Anthropic).

Installation

Option 1: Quick Install (Recommended)

Use our automated install script to get up and running quickly:

# Unix/Linux/macOS
curl -fsSL https://raw.githubusercontent.com/qualifire-dev/rogue-private/main/install.sh | bash

# Windows (PowerShell)
Invoke-Expression (Invoke-WebRequest -Uri "https://raw.githubusercontent.com/qualifire-dev/rogue-private/main/install.ps1").Content

The install script automatically:

Downloads the latest release (or specific version with -v flag, or explicitly with -v latest)
Updates your PATH
Provides both rogue and rogue-tui commands

Note: Use install.sh for Unix/Linux/macOS and install.ps1 for Windows PowerShell.

Option 2: Manual Installation

Clone the repository:

git clone https://github.com/qualifire-dev/rogue-private.git
cd rogue-private

Install dependencies:

If you are using uv:
```
uv sync
```
Or, if you are using pip:
```
pip install -e .
```
OPTIONALLY: Set up your environment variables: Create a .env file in the root directory and add your API keys. Rogue uses LiteLLM, so you can set keys for various providers.
```
OPENAI_API_KEY="sk-..."
ANTHROPIC_API_KEY="sk-..."
GOOGLE_API_KEY="..."
```

Running Rogue

Launch the Gradio web UI with the following command:

If you are using uv:

uv run -m rogue

If not:

python -m rogue

Navigate to the URL displayed in your terminal (usually http://127.0.0.1:7860) to begin.

Available Commands

After installation, you'll have access to two main commands:

rogue - The main Python-based rogue agent evaluator with Gradio web UI
rogue-tui - A modern terminal user interface built with Go and Bubble Tea

Both commands support the same core functionality but with different interfaces.

Example: Testing the T-Shirt Store Agent

This repository includes a simple example agent that sells T-shirts. You can use it to see Rogue in action.

Install exmaple dependencies:

If you are using uv:
```
 uv sync --group examples
```
or, if you are using pip:
```
pip install -e .[examples]
```
Start the example agent server in a separate terminal:

If you are using uv:
```
uv run examples/tshirt_store_agent
```
If not:
```
python examples/tshirt_store_agent
```
This will start the agent on http://localhost:10001.
Configure Rogue in the UI to point to the example agent:
- Agent URL: http://localhost:10001
- Authentication: no-auth
Run the evaluation and watch Rogue test the T-Shirt agent's policies!

🔧 CLI

This tool allows you to evaluate AI agents against a set of predefined scenarios via the command line.

🚀 Usage

Clone the repo:

git clone https://github.com/qualifire-dev/rogue.git
cd rogue

Run using uv:

uv sync
uv run -m rogue cli [OPTIONS]

Or, if you are using pip:

pip install -e .
uv run -m rogue cli [OPTIONS]

📓 CLI Arguments

Argument	Required	Default Value	Description
--workdir	No	`./.rogue`	Directory to store outputs and defaults.
--config-file	No	`<workdir>/user_config.json`	Path to a config file generated by the UI. Values from this file are used unless overridden via CLI. If the file does not exist, only cli will be used.
--evaluated-agent-url	Yes		The URL of the agent to evaluate.
--evaluated-agent-auth-type	No	`no_auth`	Auth method. Can be one of: `no_auth`, `api_key`, `bearer_token`, `basic`.
--evaluated-agent-credentials	Yes* if `auth_type` is not `no_auth`		Credentials for the agent (if required).
--input-scenarios-file	Yes	`<workdir>/scenarios.json`	Path to scenarios file.
--output-report-file	No	`<workdir>/report.md`	Where to save the markdown report.
--judge-llm	Yes		Model name for LLM evaluation (Litellm format).
--judge-llm-api-key	No		API key for LLM (see environment section).
--business-context	Yes* Unless `--business-context-file` is supplied		Business context as a string.
--business-context-file	Yes* Unless `--business-context` is supplied	`<workdir>/business_context.md`	OR path to file containing the business context. If both given, `--business-context` has priority
--deep-test-mode	No	`False`	Enables extended testing behavior.
--debug	No	`False`	Enable verbose logging.

📊 Config file

The config file is automatically generated when running the UI.
We will check for a config file in <workdir>/user_config.json and use it if it exists.
The config file is a JSON object that can contain all or a subset of the fields from the CLI arguments, except for --config-file.
Other keys in the config file are ignored.
Just remember to use snake_case keys. (e.g. --evaluated-agent-url becomes evaluated_agent_url).

Notes

⚠️ Either --business-context or --business-context-file must be provided.
⚠️ Fields marked as Required are required unless supplied via the config file.

Examples

With only a config file:

with our business context located at ./.rogue/business_context.md

`./.rogue/user_config.json`

{
  "evaluated_agent_url": "http://localhost:10001",
  "judge_llm": "openai/o4-mini"
}

Execution

uv run -m rogue cli

Same example without a config file:

Execution

uv run -m rogue cli \
    --evaluated-agent-url http://localhost:10001 \
    --judge-llm openai/o4-mini \
    --business-context-file './.rogue/business_context.md'

Key Features

🔄 Dynamic Scenario Generation: Automatically creates a comprehensive test suite from your high-level business context.
👀 Live Evaluation Monitoring: Watch the interaction between the Evaluator and your agent in a real-time chat interface.
📊 Comprehensive Reporting: Generates a detailed summary of the evaluation, including pass/fail rates, key findings, and recommendations.
🔍 Multi-Faceted Testing: Natively supports testing for policy compliance, with a flexible framework to expand to other areas like prompt injection or safety.
🤖 Broad Model Support: Compatible with a wide range of models from providers like OpenAI, Google (Gemini), and Anthropic.
🎯 User-Friendly Interface: A simple, step-by-step Gradio UI guides you through configuration, execution, and reporting.

How It Works

Rogue's workflow is designed to be simple and intuitive, managed entirely through its web interface.

Configure: You provide the endpoint and authentication details for the agent you want to test, and select the LLMs you want Rogue to use for its services (scenario generation, judging).
Generate Scenarios: You input the "business context" or a high-level description of what your agent is supposed to do. Rogue's LLM Service uses this context to generate a list of relevant test scenarios. You can review and edit these scenarios.
Run & Evaluate: You start the evaluation. The Scenario Evaluation Service spins up the EvaluatorAgent, which begins a conversation with your agent for each scenario. You can watch this conversation happen live.
View Report: Once all scenarios are complete, the LLM Service analyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent's performance.

Supported Models

The following tables show the models we have tested with Rogue.

We have successfully run our agent with the following models:

OpenAI:

gpt-5
gpt-5-mini
gpt-5-nano
openai/gpt-4.1
openai/gpt-4.1-mini
openai/gpt-4.5-preview
openai/gpt-4o
openai/gpt-4o-mini
openai/o4-mini

Gemini (vertex or google-ai):

gemini-2.5-flash
gemini-2.5-pro

Anthropic:

anthropic/claude-3-5-sonnet-latest
anthropic/claude-3-7-sonnet-latest
anthropic/claude-4-sonnet-latest

The following models are not supported:

OpenAI:

openai/o1 (including mini)
openai/o4 (including mini)

Gemini (vertex or google-ai):

gemini-2.5-flash (partial support)

Contributing

Contributions are welcome! If you'd like to contribute, please follow these steps:

Fork the repository.
Create a new branch (git checkout -b feature/your-feature-name).
Make your changes and commit them (git commit -m 'Add some feature').
Push to the branch (git push origin feature/your-feature-name).
Open a pull request.

Please make sure to update tests as appropriate.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details

Release history Release notifications | RSS feed

0.6.4

Apr 29, 2026

0.6.3

Apr 29, 2026

0.6.2

Apr 28, 2026

0.6.1

Apr 27, 2026

0.6.0

Apr 26, 2026

0.5.1

Apr 19, 2026

0.5.0

Mar 17, 2026

0.4.1

Feb 24, 2026

0.4.0

Feb 23, 2026

0.3.6

Feb 5, 2026

0.3.5

Feb 4, 2026

0.3.4

Jan 18, 2026

0.3.3

Jan 8, 2026

0.3.2

Jan 7, 2026

0.3.1

Jan 5, 2026

0.3.0

Jan 3, 2026

0.2.3

Nov 11, 2025

0.2.2

Nov 9, 2025

0.2.1

Nov 3, 2025

0.2.0

Oct 29, 2025

0.1.13

Oct 22, 2025

0.1.12

Oct 15, 2025

0.1.11

Oct 13, 2025

0.1.10

Oct 9, 2025

0.1.9

Oct 9, 2025

0.1.8

Oct 9, 2025

0.1.7

Oct 6, 2025

0.1.6

Oct 6, 2025

0.1.5

Oct 1, 2025

0.1.3

Sep 8, 2025

0.1.2

Sep 8, 2025

This version

0.1.1

Sep 7, 2025

0.1.0

Sep 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rogue_ai-0.1.1.tar.gz (13.8 MB view details)

Uploaded Sep 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rogue_ai-0.1.1-py3-none-any.whl (82.7 kB view details)

Uploaded Sep 7, 2025 Python 3

File details

Details for the file rogue_ai-0.1.1.tar.gz.

File metadata

Download URL: rogue_ai-0.1.1.tar.gz
Upload date: Sep 7, 2025
Size: 13.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rogue_ai-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`b24c3a73e6a2df8e805ceb00ec492d132b8cdcf7f2a24897756250ab2eeb0207`
MD5	`0695bd846106f21a53dfbe1fd5aa8d1f`
BLAKE2b-256	`1361905be58d56b939fb9fb917d9771f5c386377da3c5b02b80c8eec714aa265`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rogue_ai-0.1.1.tar.gz:

Publisher: release.yml on qualifire-dev/rogue-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rogue_ai-0.1.1.tar.gz
- Subject digest: b24c3a73e6a2df8e805ceb00ec492d132b8cdcf7f2a24897756250ab2eeb0207
- Sigstore transparency entry: 481873764
- Sigstore integration time: Sep 7, 2025
Source repository:
- Permalink: qualifire-dev/rogue-private@56a6cfd442a4f3a4a8737a8949d02eace5350fee
- Branch / Tag: refs/tags/v0.0.2-test
- Owner: https://github.com/qualifire-dev
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@56a6cfd442a4f3a4a8737a8949d02eace5350fee
- Trigger Event: push

File details

Details for the file rogue_ai-0.1.1-py3-none-any.whl.

File metadata

Download URL: rogue_ai-0.1.1-py3-none-any.whl
Upload date: Sep 7, 2025
Size: 82.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rogue_ai-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2180309aad203a9e561c1b8f74d9dab1a963c039e4f45ba635c1a626fecd72b9`
MD5	`c080582962f59034a6c7cfa3fb039a84`
BLAKE2b-256	`4b4fc3321f9db0a734b4f82513217814996ae2d4253b42e172fb16fbf5edca87`

See more details on using hashes here.

Provenance

The following attestation bundles were made for rogue_ai-0.1.1-py3-none-any.whl:

Publisher: release.yml on qualifire-dev/rogue-private

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: rogue_ai-0.1.1-py3-none-any.whl
- Subject digest: 2180309aad203a9e561c1b8f74d9dab1a963c039e4f45ba635c1a626fecd72b9
- Sigstore transparency entry: 481873765
- Sigstore integration time: Sep 7, 2025
Source repository:
- Permalink: qualifire-dev/rogue-private@56a6cfd442a4f3a4a8737a8949d02eace5350fee
- Branch / Tag: refs/tags/v0.0.2-test
- Owner: https://github.com/qualifire-dev
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@56a6cfd442a4f3a4a8737a8949d02eace5350fee
- Trigger Event: push

rogue-ai 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Rogue - The AI Agent Evaluator

🔥 Quick Start

Prerequisites

Installation

Option 1: Quick Install (Recommended)

Option 2: Manual Installation

Running Rogue

Available Commands

Example: Testing the T-Shirt Store Agent

🔧 CLI

🚀 Usage

📓 CLI Arguments

📊 Config file

Notes

Examples

With only a config file:

./.rogue/user_config.json

Execution

Same example without a config file:

Execution

Key Features

How It Works

Supported Models

We have successfully run our agent with the following models:

OpenAI:

Gemini (vertex or google-ai):

Anthropic:

The following models are not supported:

OpenAI:

Gemini (vertex or google-ai):

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`./.rogue/user_config.json`