A modular framework for evaluating and verifying agentic LLM outputs.
Project description
Hyperplane Eval
Hyperplane Eval is a Python-based testing framework that helps you figure out exactly when and where your AI agents break. Instead of writing manual test cases, you give Hyperplane a target function and a set of rules, and it systematically generates edge-cases to map out your agent's "Safe Polytope" — the operational volume where your agent is reliable.
🚀 How It Works: Breadth-First Evaluation
Testing an AI agent is hard because the potential input space is infinite. Hyperplane solves this by breaking down inputs into "dimensions" of complexity (e.g., Urgency, Ambiguity, Formatting).
Instead of randomly guessing inputs, Hyperplane uses a breadth-first evaluation approach:
- Dimension Extraction: It automatically extracts relevant dimensions based on the rules you want to test.
- Grid Generation: It generates a uniform grid of test scenarios across these dimensions (using Sobol sequences for perfectly even distribution).
- Input Synthesis: It uses a strong LLM to generate realistic user inputs that match those specific dimension coordinates.
- Evaluation: It executes your local agent code with the generated inputs, and evaluates the output against your rules using a Chain-of-Thought (CoT) judge.
By doing this breadth-first scan across multiple dimensions simultaneously, Hyperplane creates a mathematical map of your agent's reliability and calculates its "Reliability Coverage" as a clear, comparable percentage.
🚦 CLI Integration
Hyperplane is incredibly easy to use. You don't need to write any complex evaluation scripts or boilerplate code; everything is handled through an interactive CLI.
Setup & Installation
Install the framework via pip:
pip install hyperplane-eval
Running the CLI
Run the interactive CLI directly in your terminal from inside your project directory:
hyperplane
The wizard will immediately guide you through the evaluation setup:
- Target Selection: It will automatically scan your local Python files and let you pick the function that acts as your agent's entry point.
- Rule Definition: You define the rules your agent must follow in plain English (e.g., "Never offer a refund over $50").
- Configuration: You configure the depth (how many points to test) and breadth (how many dimensions to extract).
- Execution: The framework will spin up workers, generate the test space, execute your local code, and render a real-time terminal dashboard.
Once complete, Hyperplane generates an interactive HTML report showing exactly which dimensions cause your agent to fail, allowing you to easily identify blind spots in your system prompts.
🛠 Technology Stack
- Language: Python 3.10+
- Data Modeling:
pydantic - Math/Geometry:
numpy,scipy(Sobol sequences, ConvexHull analysis) - LLM Integration:
litellmfor universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).
📄 License
This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hyperplane_eval-0.1.5.tar.gz.
File metadata
- Download URL: hyperplane_eval-0.1.5.tar.gz
- Upload date:
- Size: 67.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f6d5c9348994d4b3b738eb84056434467574e2ada72457ca4a93c40cbf13648
|
|
| MD5 |
682f0ce4a507629335be07699214cb70
|
|
| BLAKE2b-256 |
6d8d0c49027098d405a521e99a0d5bf807b33ea2a904a17deb97bb66e6a259d6
|
File details
Details for the file hyperplane_eval-0.1.5-py3-none-any.whl.
File metadata
- Download URL: hyperplane_eval-0.1.5-py3-none-any.whl
- Upload date:
- Size: 80.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e87a691758ad428cf267f47491acbd23cae1b7e803602c1cc03c8764e0a68e47
|
|
| MD5 |
3715d2cd4ca1fd793a5536397a52cc59
|
|
| BLAKE2b-256 |
9b9af7a9cdef318b38fc2bd59608c1ff906e60dceddf0d8360049c1ceebed7a7
|