Rogue agent evaluator by Qualifire
Project description
Rogue - The AI Agent Evaluator
Rogue is a powerful tool designed to evaluate the performance, compliance, and reliability of AI agents. It pits a dynamic EvaluatorAgent against your agent using Google's A2A protocol, testing it with a range of scenarios to ensure it behaves exactly as intended.
https://github.com/user-attachments/assets/b5c04772-6916-4aab-825b-6a7476d77787
🔥 Quick Start
Prerequisites
uvx- If not installed, follow uv installation guide- Python 3.10+
- An API key for an LLM provider (e.g., OpenAI, Google, Anthropic).
Installation
Option 1: Quick Install (Recommended)
Use our automated install script to get up and running quickly:
# Unix/Linux/macOS
curl -fsSL https://raw.githubusercontent.com/qualifire-dev/rogue-private/main/install.sh | bash
# Windows (PowerShell)
Invoke-Expression (Invoke-WebRequest -Uri "https://raw.githubusercontent.com/qualifire-dev/rogue-private/main/install.ps1").Content
The install script automatically:
- Downloads the latest release (or specific version with
-vflag, or explicitly with-v latest) - Updates your PATH
- Provides both
rogueandrogue-tuicommands
Note: Use install.sh for Unix/Linux/macOS and install.ps1 for Windows PowerShell.
Option 2: Manual Installation
-
Clone the repository:
git clone https://github.com/qualifire-dev/rogue-private.git cd rogue-private
-
Install dependencies:
If you are using uv:
uv syncOr, if you are using pip:
pip install -e .
-
OPTIONALLY: Set up your environment variables: Create a
.envfile in the root directory and add your API keys. Rogue usesLiteLLM, so you can set keys for various providers.OPENAI_API_KEY="sk-..." ANTHROPIC_API_KEY="sk-..." GOOGLE_API_KEY="..."
Running Rogue
Launch the Gradio web UI with the following command:
If you are using uv:
uv run -m rogue
If not:
python -m rogue
Navigate to the URL displayed in your terminal (usually http://127.0.0.1:7860) to begin.
Available Commands
After installation, you'll have access to two main commands:
rogue- The main Python-based rogue agent evaluator with Gradio web UIrogue-tui- A modern terminal user interface built with Go and Bubble Tea
Both commands support the same core functionality but with different interfaces.
Example: Testing the T-Shirt Store Agent
This repository includes a simple example agent that sells T-shirts. You can use it to see Rogue in action.
-
Install exmaple dependencies:
If you are using uv:
uv sync --group examples
or, if you are using pip:
pip install -e .[examples]
-
Start the example agent server in a separate terminal:
If you are using uv:
uv run examples/tshirt_store_agent
If not:
python examples/tshirt_store_agentThis will start the agent on
http://localhost:10001. -
Configure Rogue in the UI to point to the example agent:
- Agent URL:
http://localhost:10001 - Authentication:
no-auth
- Agent URL:
-
Run the evaluation and watch Rogue test the T-Shirt agent's policies!
🔧 CLI
This tool allows you to evaluate AI agents against a set of predefined scenarios via the command line.
🚀 Usage
Clone the repo:
git clone https://github.com/qualifire-dev/rogue.git
cd rogue
- Run using uv:
uv sync
uv run -m rogue cli [OPTIONS]
Or, if you are using pip:
pip install -e .
uv run -m rogue cli [OPTIONS]
📓 CLI Arguments
| Argument | Required | Default Value | Description |
|---|---|---|---|
| --workdir | No | ./.rogue |
Directory to store outputs and defaults. |
| --config-file | No | <workdir>/user_config.json |
Path to a config file generated by the UI. Values from this file are used unless overridden via CLI. If the file does not exist, only cli will be used. |
| --evaluated-agent-url | Yes | The URL of the agent to evaluate. | |
| --evaluated-agent-auth-type | No | no_auth |
Auth method. Can be one of: no_auth, api_key, bearer_token, basic. |
| --evaluated-agent-credentials | Yes* if auth_type is not no_auth |
Credentials for the agent (if required). | |
| --input-scenarios-file | Yes | <workdir>/scenarios.json |
Path to scenarios file. |
| --output-report-file | No | <workdir>/report.md |
Where to save the markdown report. |
| --judge-llm | Yes | Model name for LLM evaluation (Litellm format). | |
| --judge-llm-api-key | No | API key for LLM (see environment section). | |
| --business-context | Yes* Unless --business-context-file is supplied |
Business context as a string. | |
| --business-context-file | Yes* Unless --business-context is supplied |
<workdir>/business_context.md |
OR path to file containing the business context. If both given, --business-context has priority |
| --deep-test-mode | No | False |
Enables extended testing behavior. |
| --debug | No | False |
Enable verbose logging. |
📊 Config file
The config file is automatically generated when running the UI.
We will check for a config file in <workdir>/user_config.json and use it if it exists.
The config file is a JSON object that can contain all or a subset of the fields from the CLI arguments, except for --config-file.
Other keys in the config file are ignored.
Just remember to use snake_case keys. (e.g. --evaluated-agent-url becomes evaluated_agent_url).
Notes
- ⚠️ Either
--business-contextor--business-context-filemust be provided. - ⚠️ Fields marked as Required are required unless supplied via the config file.
Examples
With only a config file:
with our business context located at ./.rogue/business_context.md
./.rogue/user_config.json
{
"evaluated_agent_url": "http://localhost:10001",
"judge_llm": "openai/o4-mini"
}
Execution
uv run -m rogue cli
Same example without a config file:
Execution
uv run -m rogue cli \
--evaluated-agent-url http://localhost:10001 \
--judge-llm openai/o4-mini \
--business-context-file './.rogue/business_context.md'
Key Features
- 🔄 Dynamic Scenario Generation: Automatically creates a comprehensive test suite from your high-level business context.
- 👀 Live Evaluation Monitoring: Watch the interaction between the Evaluator and your agent in a real-time chat interface.
- 📊 Comprehensive Reporting: Generates a detailed summary of the evaluation, including pass/fail rates, key findings, and recommendations.
- 🔍 Multi-Faceted Testing: Natively supports testing for policy compliance, with a flexible framework to expand to other areas like prompt injection or safety.
- 🤖 Broad Model Support: Compatible with a wide range of models from providers like OpenAI, Google (Gemini), and Anthropic.
- 🎯 User-Friendly Interface: A simple, step-by-step Gradio UI guides you through configuration, execution, and reporting.
How It Works
Rogue's workflow is designed to be simple and intuitive, managed entirely through its web interface.
- Configure: You provide the endpoint and authentication details for the agent you want to test, and select the LLMs you want Rogue to use for its services (scenario generation, judging).
- Generate Scenarios: You input the "business context" or a high-level description of what your agent is supposed to do. Rogue's
LLM Serviceuses this context to generate a list of relevant test scenarios. You can review and edit these scenarios. - Run & Evaluate: You start the evaluation. The
Scenario Evaluation Servicespins up theEvaluatorAgent, which begins a conversation with your agent for each scenario. You can watch this conversation happen live. - View Report: Once all scenarios are complete, the
LLM Serviceanalyzes the results and generates a Markdown-formatted report, giving you a clear summary of your agent's performance.
Supported Models
The following tables show the models we have tested with Rogue.
We have successfully run our agent with the following models:
OpenAI:
- gpt-5
- gpt-5-mini
- gpt-5-nano
- openai/gpt-4.1
- openai/gpt-4.1-mini
- openai/gpt-4.5-preview
- openai/gpt-4o
- openai/gpt-4o-mini
- openai/o4-mini
Gemini (vertex or google-ai):
- gemini-2.5-flash
- gemini-2.5-pro
Anthropic:
- anthropic/claude-3-5-sonnet-latest
- anthropic/claude-3-7-sonnet-latest
- anthropic/claude-4-sonnet-latest
The following models are not supported:
OpenAI:
- openai/o1 (including mini)
- openai/o4 (including mini)
Gemini (vertex or google-ai):
- gemini-2.5-flash (partial support)
Contributing
Contributions are welcome! If you'd like to contribute, please follow these steps:
- Fork the repository.
- Create a new branch (
git checkout -b feature/your-feature-name). - Make your changes and commit them (
git commit -m 'Add some feature'). - Push to the branch (
git push origin feature/your-feature-name). - Open a pull request.
Please make sure to update tests as appropriate.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rogue_ai-0.1.1.tar.gz.
File metadata
- Download URL: rogue_ai-0.1.1.tar.gz
- Upload date:
- Size: 13.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b24c3a73e6a2df8e805ceb00ec492d132b8cdcf7f2a24897756250ab2eeb0207
|
|
| MD5 |
0695bd846106f21a53dfbe1fd5aa8d1f
|
|
| BLAKE2b-256 |
1361905be58d56b939fb9fb917d9771f5c386377da3c5b02b80c8eec714aa265
|
Provenance
The following attestation bundles were made for rogue_ai-0.1.1.tar.gz:
Publisher:
release.yml on qualifire-dev/rogue-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rogue_ai-0.1.1.tar.gz -
Subject digest:
b24c3a73e6a2df8e805ceb00ec492d132b8cdcf7f2a24897756250ab2eeb0207 - Sigstore transparency entry: 481873764
- Sigstore integration time:
-
Permalink:
qualifire-dev/rogue-private@56a6cfd442a4f3a4a8737a8949d02eace5350fee -
Branch / Tag:
refs/tags/v0.0.2-test - Owner: https://github.com/qualifire-dev
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@56a6cfd442a4f3a4a8737a8949d02eace5350fee -
Trigger Event:
push
-
Statement type:
File details
Details for the file rogue_ai-0.1.1-py3-none-any.whl.
File metadata
- Download URL: rogue_ai-0.1.1-py3-none-any.whl
- Upload date:
- Size: 82.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2180309aad203a9e561c1b8f74d9dab1a963c039e4f45ba635c1a626fecd72b9
|
|
| MD5 |
c080582962f59034a6c7cfa3fb039a84
|
|
| BLAKE2b-256 |
4b4fc3321f9db0a734b4f82513217814996ae2d4253b42e172fb16fbf5edca87
|
Provenance
The following attestation bundles were made for rogue_ai-0.1.1-py3-none-any.whl:
Publisher:
release.yml on qualifire-dev/rogue-private
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rogue_ai-0.1.1-py3-none-any.whl -
Subject digest:
2180309aad203a9e561c1b8f74d9dab1a963c039e4f45ba635c1a626fecd72b9 - Sigstore transparency entry: 481873765
- Sigstore integration time:
-
Permalink:
qualifire-dev/rogue-private@56a6cfd442a4f3a4a8737a8949d02eace5350fee -
Branch / Tag:
refs/tags/v0.0.2-test - Owner: https://github.com/qualifire-dev
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@56a6cfd442a4f3a4a8737a8949d02eace5350fee -
Trigger Event:
push
-
Statement type: