Skip to main content

A modular framework for evaluating and verifying agentic LLM outputs.

Project description

Hyperplane Eval

Hyperplane Eval is a Python-based testing framework that helps you figure out exactly when and where your AI agents break. Instead of writing manual test cases, you give Hyperplane a target function and a set of rules, and it systematically generates edge-cases to map out your agent's "Safe Polytope" — the operational volume where your agent is reliable.

🚀 How It Works: Breadth-First Evaluation

Testing an AI agent is hard because the potential input space is infinite. Hyperplane solves this by breaking down inputs into "dimensions" of complexity (e.g., Urgency, Ambiguity, Formatting).

Instead of randomly guessing inputs, Hyperplane uses a breadth-first evaluation approach:

  1. Dimension Extraction: It automatically extracts relevant dimensions based on the rules you want to test.
  2. Grid Generation: It generates a uniform grid of test scenarios across these dimensions (using Sobol sequences for perfectly even distribution).
  3. Input Synthesis: It uses a strong LLM to generate realistic user inputs that match those specific dimension coordinates.
  4. Evaluation: It executes your local agent code with the generated inputs, and evaluates the output against your rules using a Chain-of-Thought (CoT) judge.

By doing this breadth-first scan across multiple dimensions simultaneously, Hyperplane creates a mathematical map of your agent's reliability and calculates its "Reliability Coverage" as a clear, comparable percentage.

🚦 CLI Integration

Hyperplane is incredibly easy to use. You don't need to write any complex evaluation scripts or boilerplate code; everything is handled through an interactive CLI.

Setup & Installation

Install the framework via pip:

pip install hyperplane-eval

Running the CLI

Run the interactive CLI directly in your terminal from inside your project directory:

hyperplane

The wizard will immediately guide you through the evaluation setup:

  1. Target Selection: It will automatically scan your local Python files and let you pick the function that acts as your agent's entry point.
  2. Rule Definition: You define the rules your agent must follow in plain English (e.g., "Never offer a refund over $50").
  3. Configuration: You configure the depth (how many points to test) and breadth (how many dimensions to extract).
  4. Execution: The framework will spin up workers, generate the test space, execute your local code, and render a real-time terminal dashboard.

Once complete, Hyperplane generates an interactive HTML report showing exactly which dimensions cause your agent to fail, allowing you to easily identify blind spots in your system prompts.

🛠 Technology Stack

  • Language: Python 3.10+
  • Data Modeling: pydantic
  • Math/Geometry: numpy, scipy (Sobol sequences, ConvexHull analysis)
  • LLM Integration: litellm for universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).

📄 License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperplane_eval-0.1.5.tar.gz (67.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hyperplane_eval-0.1.5-py3-none-any.whl (80.9 kB view details)

Uploaded Python 3

File details

Details for the file hyperplane_eval-0.1.5.tar.gz.

File metadata

  • Download URL: hyperplane_eval-0.1.5.tar.gz
  • Upload date:
  • Size: 67.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperplane_eval-0.1.5.tar.gz
Algorithm Hash digest
SHA256 2f6d5c9348994d4b3b738eb84056434467574e2ada72457ca4a93c40cbf13648
MD5 682f0ce4a507629335be07699214cb70
BLAKE2b-256 6d8d0c49027098d405a521e99a0d5bf807b33ea2a904a17deb97bb66e6a259d6

See more details on using hashes here.

File details

Details for the file hyperplane_eval-0.1.5-py3-none-any.whl.

File metadata

File hashes

Hashes for hyperplane_eval-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e87a691758ad428cf267f47491acbd23cae1b7e803602c1cc03c8764e0a68e47
MD5 3715d2cd4ca1fd793a5536397a52cc59
BLAKE2b-256 9b9af7a9cdef318b38fc2bd59608c1ff906e60dceddf0d8360049c1ceebed7a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page