Local tool to evaluate AI agents and find their weak points.

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Hyperplane Eval

Hyperplane Eval helps teams discover behavioral failures before deployment.

It works as a CLI and programmatic tool for evaluating AI agents against business, security, and ethical requirements using intelligent test generation rather than manually written test cases.

Why Hyperplane?

Most AI teams evaluate agents with a few dozen manually written prompts.

The problem is that manually written tests only cover a tiny fraction of the agent's behavioral space.

Hyperplane automatically explores that space, generating thousands of semantically diverse inputs to uncover failures before users do.

Features

Generate thousands of semantically diverse test cases
Evaluate business, security, and ethical requirements of your agent
Automatically find edge cases and breaking points
Local-first CLI workflow
Framework-agnostic agent integration
Detailed evaluation reports. See example here

CLI Integration

Hyperplane operates entirely locally through a terminal-based orchestration wizard.

Setup & Installation

Install the framework via pip:

pip install hyperplane-eval

Running the CLI

Run the interactive CLI directly in your terminal from inside your project directory:

hyperplane

The CLI wizard will guide you through:

Target Binding: Automatically finding your agent's code.
Constraint Definition: Setting natural language rules for your agent.
Configuration: Setting the scale and depth of the evaluation.
Execution: Running the evaluation with a live dashboard.

Hyperplane outputs a structured dataset and an HTML report identifying the specific types and characteristics of inputs that cause the agent to fail, helping you quickly isolate prompt engineering or logic regressions.

Hyperplane Evaluation Dashboard

Programmatic Integration

You can also integrate Hyperplane Eval directly into your Python codebase or CI/CD testing suites using the Evaluator API:

from hyperplane import Evaluator
from litellm import Router

# 1. Initialize your litellm Router
router = Router(
    model_list=[{
        "model_name": "gpt-4o", 
        "litellm_params": {"model": "gpt-4o"}
    }]
)

# 2. Define your target agent
def my_agent(prompt: str) -> str:
    return "I am a safe agent."

# 3. Initialize the Evaluator
evaluator = Evaluator(
    agent_desc="A helpful AI assistant",
    param_desc={"prompt": "The user input prompt"},
    target_callable=my_agent,
    llm_client=router
)

# 4. Add constraints and run
evaluator.run(rules=["Never execute tool calls with unsafe parameters.", "Always respond in English."])

Architecture & Methodology

Evaluating agentic systems presents a curse-of-dimensionality problem due to the infinite input space. Hyperplane mitigates this via a dimensional reduction and bounded sampling approach:

Orthogonal Dimension Extraction: The target heuristic or constraint (e.g., a business logic rule) is passed to an LLM, which extracts a set of orthogonal, continuous axes representing the variance in potential inputs (e.g., user_frustration [0,1], budget_constraint [0,1]).
Quasi-Random Initialization: The framework maps a bounded multi-dimensional continuous input space $S \in [0, 1]^D$. It utilizes Sobol sequences to generate a low-discrepancy initialization grid to ensure uniform volumetric coverage without clustering.
Synthetic Input Generation: Bounded coordinate vectors are mapped back into natural language. A Generator LLM synthesizes adversarial or conversational inputs as structured payloads that strictly adhere to the defined vector coordinates.
Response Classification The target agent executes the synthesized inputs. An Evaluator LLM utilizes Chain-of-Thought (CoT) reasoning to classify the agent's response as a pass (1) or fail (0) against the constraint.
Surrogate Modeling & Active Search: The framework fits a Random Forest surrogate classifier over the evaluated points to approximate the failure boundary. It utilizes an active search algorithm to sample points near the decision boundary until volumetric saturation is reached, stopping early via dimension-scaled mismatch rate thresholds.

The resulting artifact is a detailed report allowing engineers to identify the specific input themes and characteristics that reliably induce constraint violations.

Privacy & Security (BYOK)

Hyperplane Eval is designed for enterprise privacy using a Bring Your Own Key (BYOK) architecture:

100% Local Execution: The orchestrator and test synthesis engine run entirely on your local machine or CI/CD runner.
No Telemetry or Data Logging: We do not collect product telemetry, execution logs, or source code.
Direct Vendor Routing: Your API keys are routed strictly to the LLM provider you configure (via litellm) and are never intercepted or proxied.
Budget Safe: Built-in safeguards and configurable depth/breadth parameters ensure the evaluation pipeline generates high coverage without blowing up your token bill.
Open Source Auditing: The entire orchestration pipeline is open-source, allowing your security team to fully audit the codebase.

🛠 Technology Stack

Language: Python 3.10+
Data Modeling: pydantic
Math/Geometry: numpy, scipy (Sobol sequences, ConvexHull analysis)
LLM Integration: litellm for universal API connectivity (OpenAI, Gemini, Anthropic, or any local vLLM).

📄 License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more information.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.14

Jun 17, 2026

0.1.13

Jun 17, 2026

0.1.12

Jun 17, 2026

0.1.11

Jun 16, 2026

0.1.10

Jun 16, 2026

0.1.9

Jun 16, 2026

0.1.8

Jun 16, 2026

This version

0.1.7

Jun 16, 2026

0.1.6

Jun 16, 2026

0.1.5

Jun 15, 2026

0.1.4

Jun 15, 2026

0.1.3

Jun 15, 2026

0.1.2

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperplane_eval-0.1.7.tar.gz (58.4 kB view details)

Uploaded Jun 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperplane_eval-0.1.7-py3-none-any.whl (71.2 kB view details)

Uploaded Jun 16, 2026 Python 3

File details

Details for the file hyperplane_eval-0.1.7.tar.gz.

File metadata

Download URL: hyperplane_eval-0.1.7.tar.gz
Upload date: Jun 16, 2026
Size: 58.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperplane_eval-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`a02253a448d3860fda668cd6d4a0658b522b35ec5ef7e39fa0d4935bdb6ce7d8`
MD5	`76e6c8c5217b64b148daf93256303470`
BLAKE2b-256	`da2b6b074e7d0c5e624bd126c5f78b45e89d9f5a691be90336a63f0e353b9077`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperplane_eval-0.1.7.tar.gz:

Publisher: publish.yml on Aquithm/hyperplane

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hyperplane_eval-0.1.7.tar.gz
- Subject digest: a02253a448d3860fda668cd6d4a0658b522b35ec5ef7e39fa0d4935bdb6ce7d8
- Sigstore transparency entry: 1842359335
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: Aquithm/hyperplane@559c26c9fdeebe67edae84b8d084b544504662f9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Aquithm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@559c26c9fdeebe67edae84b8d084b544504662f9
- Trigger Event: workflow_dispatch

File details

Details for the file hyperplane_eval-0.1.7-py3-none-any.whl.

File metadata

Download URL: hyperplane_eval-0.1.7-py3-none-any.whl
Upload date: Jun 16, 2026
Size: 71.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperplane_eval-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`949a0d2f681fd6a6103211d4227fad70f381662dd897b372d881a0b976963169`
MD5	`a73b0075e2775f0dbf420a9e8a0ed60f`
BLAKE2b-256	`f5081b345491fe0ec883465e7917830491b6e494c710af8f041e378370ef6c8b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hyperplane_eval-0.1.7-py3-none-any.whl:

Publisher: publish.yml on Aquithm/hyperplane

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hyperplane_eval-0.1.7-py3-none-any.whl
- Subject digest: 949a0d2f681fd6a6103211d4227fad70f381662dd897b372d881a0b976963169
- Sigstore transparency entry: 1842359513
- Sigstore integration time: Jun 16, 2026
Source repository:
- Permalink: Aquithm/hyperplane@559c26c9fdeebe67edae84b8d084b544504662f9
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Aquithm
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@559c26c9fdeebe67edae84b8d084b544504662f9
- Trigger Event: workflow_dispatch

hyperplane-eval 0.1.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Hyperplane Eval

Why Hyperplane?

Features

CLI Integration

Setup & Installation

Running the CLI

Programmatic Integration

Architecture & Methodology

Privacy & Security (BYOK)

🛠 Technology Stack

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance