A tool to oversee and evaluate agentic AI performance with validation and guidance

These details have not been verified by PyPI

Project links

Homepage

Project description

AgentOversight

A modular tool to monitor, validate, and guide agentic AI performance.

AgentOversight is a Python-based platform designed to oversee AI agents. It allows users to define custom validation rules, receive directional guidance, and track performance metrics—all through a web interface or programmatically. Supporting multiple models like OpenAI, DeepSeek, and Grok, it’s perfect for developers and researchers ensuring AI reliability.

Why AgentOversight

Consistency: Agents might interpret rules differently across runs, leading to inconsistent validation (e.g., word counting could vary slightly).
Latency: Sending requests to an external AI (via API) adds network delay compared to local computation.
Cost : If using a paid API (e.g., OpenAI), each validation request costs money, whereas local code is free.
Control: You’re dependent on the agent’s capabilities and can’t easily tweak the validation logic without changing the prompt, which might not scale for a UI-driven tool.
Metrics: Tracking performance (e.g., response time) becomes trickier if the agent handles everything remotely.


The AgentOversight class exists as a dedicated local component for these reasons:
Independence: It decouples validation, guidance, and metrics from any specific AI agent, making the hub a standalone tool that can oversee any agent’s output (e.g., Grok, ChatGPT, or a custom model). You don’t need an AI to use it—just the output text.

Performance: Local processing is faster and doesn’t rely on external API calls, which is critical for real-time monitoring in a web app.

Customizability: You control the logic. Want to add a new rule type (e.g., max_sentences)? Just update the class—no need to rewrite prompts or rely on an agent understanding it.

Transparency: The rules and results are deterministic and visible in code, not hidden in an AI’s black-box reasoning.

Metrics Tracking: It’s easier to log and analyze metrics (e.g., response time, accuracy trends) locally with a database than to extract them from agent responses.

Why Not Just Prompt?
The hub’s goal is to oversee agents, not replace their work. If the agent itself validates its output, you lose the independent “second opinion” that an oversight tool provides.

Users might want to apply consistent rules across multiple agents or outputs without crafting prompts each time—AgentOversight makes this reusable and UI-driven.

Features

Custom Validation: Define rules (e.g., max_words=50) to validate agent outputs.
Directional Guidance: Get actionable suggestions (e.g., "Shorten the response").
Performance Metrics: Track response time and accuracy, logged automatically to SQLite (oversight.db).
Multi-Model Support: Test outputs from OpenAI, DeepSeek, Grok, and more.
Auto-Correction: Optionally refine outputs based on guidance with a configurable retry limit.
Web Interface: Built with Flask for easy interaction and visualization.

Installation Methods

Via pip

pip install agent-oversight

From Source

Clone the repository:

git clone https://github.com/LogeswaranA/AgentOversight.git
cd AgentOversight

Set up a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Web Interface

Run the Flask app:

agent-oversight

Open your browser to http://127.0.0.1:5000. Metrics are automatically logged to oversight.db in your working directory.

Programmatic Example

from AgentOversight.agent_logic import AgentOversight

# Initialize with optional API keys
oversight = AgentOversight(openai_api_key="your-openai-key")
oversight.set_rules("max_words=10,must_contain=test")

# Process input with auto-correction
result = oversight.process_input("openai", "Write a short test sentence.", auto_correct=True)
print(result)  # Metrics are logged to oversight.db automatically

Sample Output:

{
    'output': 'This is a test.',
    'validation': 'Word count: 5 (max: 10) - Valid | Contains 'test': Yes',
    'guidance': 'Looks good!',
    'metrics': {'response_time': 0.85, 'accuracy': 1.0},
    'retries': 0
}

Note: The SQLite database (oversight.db) is created and metrics are logged automatically whenever process_input or track_metrics is called. No separate initialization is required.

Supported Rules

Text Length

max_words=<int>: Max word count (e.g., max_words=50).
min_words=<int>: Min word count (e.g., min_words=10).
max_chars=<int>: Max character count (e.g., max_chars=200).
min_chars=<int>: Min character count (e.g., min_chars=20).
max_sentences=<int>: Max sentence count (e.g., max_sentences=3).
min_sentences=<int>: Min sentence count (e.g., min_sentences=1).

Content

must_contain=<text>: Must include text (e.g., must_contain=data).
must_not_contain=<text>: Must exclude text (e.g., must_not_contain=error).
exact_match=<text>: Must match exactly (e.g., exact_match=Hello world).
starts_with=<text>: Must start with text (e.g., starts_with=The).
ends_with=<text>: Must end with text (e.g., ends_with=.).

Structural

has_punctuation=<yes/no>: Check for punctuation (e.g., has_punctuation=yes).
has_numbers=<yes/no>: Check for numbers (e.g., has_numbers=no).
max_unique_words=<int>: Max unique words (e.g., max_unique_words=20).
min_unique_words=<int>: Min unique words (e.g., min_unique_words=5).

Advanced (OpenAI)

is_coherent=<yes>: Must be coherent (e.g., is_coherent=yes).
tone=<positive/negative/neutral>: Must match tone (e.g., tone=positive).
is_factual=<yes>: Must be factually plausible (e.g., is_factual=yes).
readability=<easy/medium/hard>: Must match readability level (e.g., readability=easy).
improve=<yes>: Suggest an improvement (e.g., improve=yes).

Performance

max_response_time=<float>: Max response time in seconds (e.g., max_response_time=1.5).

Supported Models

openai: GPT-3.5-turbo (requires OpenAI API key).
deepseek: DeepSeek R1 (requires DeepSeek API key).
grok: Grok 3 (requires xAI API key, placeholder until official API is available).

Requirements

Create a requirements.txt file with:

Flask==2.3.3
nltk==3.8.1
openai==1.10.0
requests==2.31.0
textblob==0.18.0

Configuration

API Keys: Provide keys during initialization (e.g., AgentOversight(openai_api_key="your-key")). For security, use environment variables:

import os
oversight = AgentOversight(openai_api_key=os.getenv("OPENAI_API_KEY"))

Contributing

Fork the repository: GitHub Repository.
Create a feature branch: git checkout -b feature/your-feature.
Commit your changes: git commit -m "Add your feature".
Push to the branch: git push origin feature/your-feature.
Open a Pull Request.

License

MIT License - see LICENSE for details.

Contact

For questions or suggestions, reach out via GitHub Issues or email at loks2cool@gmail.com.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Mar 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentoversight-0.1.0.tar.gz (9.4 kB view details)

Uploaded Mar 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentoversight-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Mar 17, 2025 Python 3

File details

Details for the file agentoversight-0.1.0.tar.gz.

File metadata

Download URL: agentoversight-0.1.0.tar.gz
Upload date: Mar 17, 2025
Size: 9.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for agentoversight-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6f5636fe8859c754ef298b1fdc735ad1bbd6091889c00d64dccc70d0931af923`
MD5	`5519200aa7748759b42a70c226feeac0`
BLAKE2b-256	`ae406f5882a6d10ae5d79710ac08af6fa2dedc33b679a0d16a52b7466a228e3d`

See more details on using hashes here.

File details

Details for the file agentoversight-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentoversight-0.1.0-py3-none-any.whl
Upload date: Mar 17, 2025
Size: 10.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for agentoversight-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ac8cbc41b48fc1827546d17101d5f834af99b4bde74d82030534ba25f982338a`
MD5	`18e4a68effcc7740d8431eba13912c8b`
BLAKE2b-256	`6cf81d2b0db7401146ca61e7ceb81882e201d1632ceb94e85cfac7e1a480d928`

See more details on using hashes here.

AgentOversight 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentOversight

Why AgentOversight

Features

Installation Methods

Via pip

From Source

Usage

Web Interface

Programmatic Example

Sample Output:

Supported Rules

Text Length

Content

Structural

Advanced (OpenAI)

Performance

Supported Models

Requirements

Configuration

Contributing

License

Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes