A modular framework for evaluating and verifying agentic LLM outputs.

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Asymptotic Agent Evaluation Framework

The Asymptotic Agent Evaluation Framework is a Python-based system designed to evaluate AI agents across a global multi-dimensional input space using a linear pipeline architecture. It strategically explores the boundaries of agent reliability to map the "Safe Polytope" of operation.

🚀 Overview

The framework evaluates how AI agents perform as they move through different "dimensions" of complexity (e.g., urgency, ambiguity, complexity). By sampling these dimensions and generating synthetic test cases, the system maps the Safe Polytope — the operational volume where an agent's performance is mathematically reliable ($P_{sat} \ge 0.95$).

Key Features

Adaptive Navigation: Uses Sobol sequences for uniform expansion from stable regions and binary search to pinpoint failure boundaries.
Chain-of-Thought (CoT) Evaluation: A reasoning-first judge LLM analyzes compliance before scoring, ensuring high-fidelity boundary detection.
Geometric Analysis: Calculates the hyper-volume of the agent's safe operational space using N-dimensional Convex Hulls.
Persistence: Automatically saves the evaluation state to JSON after each iteration for crash-resilience.
Linear Pipeline: Strictly typed state progression from raw coordinates to final performance metrics.
Local Execution: Evaluates code directly through AST parsing and local binding execution without WebSocket overhead.
Provider Agnostic: Uses LiteLLM to dynamically connect to any LLM provider (OpenAI, Gemini, vLLM, etc.) via simple API keys.

🔄 The Linear Pipeline & Iterative Mutation

The evaluation framework operates as a streaming loop that systematically explores the boundary of an agent's failure.

[!NOTE] Algorithm Update: The framework has transitioned from Zero-Shot Synthesis to the Iterative Mutation Pipeline to prevent semantic drift, reduce API overhead, and achieve perfect variable isolation.

graph LR
    A[InputSpace] -->|Origin Prompt| B[Sequential Mutation Engine]
    B -->|Mutated Prompt| C[AgentRunner]
    C -->|Executed Vector| D[AgentOutputEvaluator]
    D -->|Evaluated Vector| E[InputSpace History]
    E -->|Evaluated Vector| F[Adaptive Navigator]
    F -->|Next Coord| B

State Transitions (Core Vector Hierarchy)

The framework uses a strictly typed progression to ensure data integrity:

ScenarioVector (Stage 1): Raw N-dimensional mutation coordinates (Urgency, Flooding, Bijection).
SynthesizedVector (Stage 2): Origin prompt mutated by the coordinates.
ExecutedVector (Stage 3): Mutated prompt run through the target agent (N runs).
EvaluatedVector (Stage 4): Performance score ($P_{sat}$) evaluated against the target rubric rule.

🛠 Technology Stack

Language: Python 3.11+
Data Modeling: pydantic
Math/Geometry: numpy, scipy (Sobol sequences, ConvexHull analysis)
Orchestration: asyncio for parallel scenario generation and execution.
LLM Integration: litellm for universal API connectivity.

📂 Project Structure

├── engine/                      # Core Evaluation Logic
│   ├── domain/                  # Vector state hierarchy and Prompt features
│   ├── stages/                  # Pipeline stages (input_space, generator, evaluator, navigator, etc.)
│   ├── pipeline.py              # Central evaluation pipeline orchestrator
│   └── prompt_loader.py         # Utility for loading external LLM prompts
├── adapters/                    # External Interfaces
│   ├── llms/                    # Universal LLM Client (LiteLLM Wrapper)
│   ├── runners/                 # Target agent execution abstractions
│   └── local_bindings/          # AST scanning and local function execution
├── cli/                         # Interactive CLI Interface
│   └── app.py                   # CLI Application and real-time dashboard logic
├── prompts/                     # Externalized LLM prompt templates (.txt)
├── reporting/                   # Results Analysis
│   └── analyser.py              # Report compiler & vulnerability analyser
├── results/                     # Evaluated state & reports
└── README.md                    # Project documentation

🚦 Getting Started

Prerequisites

Python 3.11+
Access to an LLM inference server or API Key (e.g., OpenAI, Gemini, or a local vLLM instance)

Setup & Installation

Environment Setup:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Configuration: The framework has an interactive configuration step built-in. Launch the CLI to pick a target python file, select/enter rules, configure scenarios, and supply your chosen LLM model/API Key.

Running the Evaluation

Execute the interactive CLI:

PYTHONPATH=. python3 cli/app.py

The system will:

Prompt you for setup configurations in a CLI Wizard.
Initialize the search space and extract safety dimensions.
Spawn workers to systematically stress-test the agent rules locally.
Render a real-time CLI dashboard with a progress bar and events log.
Save evolving states to results/ and automatically launch the final HTML results dashboard in your browser.

[!NOTE] The results/ directory is tracked by Git, but its contents (JSON states, HTML reports, logs) are ignored to keep the repository clean.

📈 Analysis Results

The framework outputs a hyper-volume metric (Reliability Coverage) which represents the percentage of the input space where the agent meets the reliability threshold. This allows for direct mathematical comparison between different model versions or system prompts.

📄 License

This project is licensed under the Apache License, Version 2.0. See the LICENSE file for more information.

Project details

These details have not been verified by PyPI

License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.1.14

Jun 17, 2026

0.1.13

Jun 17, 2026

0.1.12

Jun 17, 2026

0.1.11

Jun 16, 2026

0.1.10

Jun 16, 2026

0.1.9

Jun 16, 2026

0.1.8

Jun 16, 2026

0.1.7

Jun 16, 2026

0.1.6

Jun 16, 2026

0.1.5

Jun 15, 2026

0.1.4

Jun 15, 2026

0.1.3

Jun 15, 2026

This version

0.1.2

Jun 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hyperplane_eval-0.1.2.tar.gz (64.5 kB view details)

Uploaded Jun 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hyperplane_eval-0.1.2-py3-none-any.whl (66.2 kB view details)

Uploaded Jun 15, 2026 Python 3

File details

Details for the file hyperplane_eval-0.1.2.tar.gz.

File metadata

Download URL: hyperplane_eval-0.1.2.tar.gz
Upload date: Jun 15, 2026
Size: 64.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperplane_eval-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`3838bd07d2c72cfee4a114ea2dc1c46ee8ccef45d8bb18f346e6138ed14adfcd`
MD5	`f8c45a2858905cbd5db0bd515882d52f`
BLAKE2b-256	`e034d024cbdb2b454b7a098b43a3016ae86cf46d63e080d5269de2bd707b4bae`

See more details on using hashes here.

File details

Details for the file hyperplane_eval-0.1.2-py3-none-any.whl.

File metadata

Download URL: hyperplane_eval-0.1.2-py3-none-any.whl
Upload date: Jun 15, 2026
Size: 66.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hyperplane_eval-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6a9de4f45a037c70636666e4ff9022a9d3ef9a477b5a186f67a7a6e97ce1fa1e`
MD5	`797e708c0d324f479fcfbce3a0bc1929`
BLAKE2b-256	`e49fca817a89dabffc61af6525e923cbc9222bf79af441e8dc4271ef4cef65cb`

See more details on using hashes here.

hyperplane-eval 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Asymptotic Agent Evaluation Framework

🚀 Overview

Key Features

🔄 The Linear Pipeline & Iterative Mutation

State Transitions (Core Vector Hierarchy)

🛠 Technology Stack

📂 Project Structure

🚦 Getting Started

Prerequisites

Setup & Installation

Running the Evaluation

📈 Analysis Results

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes