A tool to oversee and evaluate agentic AI performance with validation and guidance
Project description
AgentOversight
A modular tool to monitor, validate, and guide agentic AI performance.
AgentOversight is a Python-based platform designed to oversee AI agents. It allows users to define custom validation rules, receive directional guidance, and track performance metrics—all through a web interface or programmatically. Supporting multiple models like OpenAI, DeepSeek, and Grok, it’s perfect for developers and researchers ensuring AI reliability.
Why AgentOversight
-
Consistency: Agents might interpret rules differently across runs, leading to inconsistent validation (e.g., word counting could vary slightly).
-
Latency: Sending requests to an external AI (via API) adds network delay compared to local computation.
-
Cost : If using a paid API (e.g., OpenAI), each validation request costs money, whereas local code is free.
-
Control: You’re dependent on the agent’s capabilities and can’t easily tweak the validation logic without changing the prompt, which might not scale for a UI-driven tool.
-
Metrics: Tracking performance (e.g., response time) becomes trickier if the agent handles everything remotely.
The AgentOversight class exists as a dedicated local component for these reasons:
Independence: It decouples validation, guidance, and metrics from any specific AI agent, making the hub a standalone tool that can oversee any agent’s output (e.g., Grok, ChatGPT, or a custom model). You don’t need an AI to use it—just the output text.
Performance: Local processing is faster and doesn’t rely on external API calls, which is critical for real-time monitoring in a web app.
Customizability: You control the logic. Want to add a new rule type (e.g., max_sentences)? Just update the class—no need to rewrite prompts or rely on an agent understanding it.
Transparency: The rules and results are deterministic and visible in code, not hidden in an AI’s black-box reasoning.
Metrics Tracking: It’s easier to log and analyze metrics (e.g., response time, accuracy trends) locally with a database than to extract them from agent responses.
Why Not Just Prompt?
The hub’s goal is to oversee agents, not replace their work. If the agent itself validates its output, you lose the independent “second opinion” that an oversight tool provides.
Users might want to apply consistent rules across multiple agents or outputs without crafting prompts each time—AgentOversight makes this reusable and UI-driven.
Features
- Custom Validation: Define rules (e.g.,
max_words=50) to validate agent outputs. - Directional Guidance: Get actionable suggestions (e.g., "Shorten the response").
- Performance Metrics: Track response time and accuracy, logged automatically to SQLite (
oversight.db). - Multi-Model Support: Test outputs from OpenAI, DeepSeek, Grok, and more.
- Auto-Correction: Optionally refine outputs based on guidance with a configurable retry limit.
- Web Interface: Built with Flask for easy interaction and visualization.
Installation Methods
Via pip
pip install agent-oversight
From Source
Clone the repository:
git clone https://github.com/LogeswaranA/AgentOversight.git
cd AgentOversight
Set up a virtual environment (recommended):
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
Install dependencies:
pip install -r requirements.txt
Usage
Web Interface
Run the Flask app:
agent-oversight
Open your browser to http://127.0.0.1:5000. Metrics are automatically logged to oversight.db in your working directory.
Programmatic Example
from AgentOversight.agent_logic import AgentOversight
# Initialize with optional API keys
oversight = AgentOversight(openai_api_key="your-openai-key")
oversight.set_rules("max_words=10,must_contain=test")
# Process input with auto-correction
result = oversight.process_input("openai", "Write a short test sentence.", auto_correct=True)
print(result) # Metrics are logged to oversight.db automatically
Sample Output:
{
'output': 'This is a test.',
'validation': 'Word count: 5 (max: 10) - Valid | Contains 'test': Yes',
'guidance': 'Looks good!',
'metrics': {'response_time': 0.85, 'accuracy': 1.0},
'retries': 0
}
Note: The SQLite database (
oversight.db) is created and metrics are logged automatically wheneverprocess_inputortrack_metricsis called. No separate initialization is required.
Supported Rules
Text Length
max_words=<int>: Max word count (e.g.,max_words=50).min_words=<int>: Min word count (e.g.,min_words=10).max_chars=<int>: Max character count (e.g.,max_chars=200).min_chars=<int>: Min character count (e.g.,min_chars=20).max_sentences=<int>: Max sentence count (e.g.,max_sentences=3).min_sentences=<int>: Min sentence count (e.g.,min_sentences=1).
Content
must_contain=<text>: Must include text (e.g.,must_contain=data).must_not_contain=<text>: Must exclude text (e.g.,must_not_contain=error).exact_match=<text>: Must match exactly (e.g.,exact_match=Hello world).starts_with=<text>: Must start with text (e.g.,starts_with=The).ends_with=<text>: Must end with text (e.g.,ends_with=.).
Structural
has_punctuation=<yes/no>: Check for punctuation (e.g.,has_punctuation=yes).has_numbers=<yes/no>: Check for numbers (e.g.,has_numbers=no).max_unique_words=<int>: Max unique words (e.g.,max_unique_words=20).min_unique_words=<int>: Min unique words (e.g.,min_unique_words=5).
Advanced (OpenAI)
is_coherent=<yes>: Must be coherent (e.g.,is_coherent=yes).tone=<positive/negative/neutral>: Must match tone (e.g.,tone=positive).is_factual=<yes>: Must be factually plausible (e.g.,is_factual=yes).readability=<easy/medium/hard>: Must match readability level (e.g.,readability=easy).improve=<yes>: Suggest an improvement (e.g.,improve=yes).
Performance
max_response_time=<float>: Max response time in seconds (e.g.,max_response_time=1.5).
Supported Models
openai: GPT-3.5-turbo (requires OpenAI API key).deepseek: DeepSeek R1 (requires DeepSeek API key).grok: Grok 3 (requires xAI API key, placeholder until official API is available).
Requirements
Create a requirements.txt file with:
Flask==2.3.3
nltk==3.8.1
openai==1.10.0
requests==2.31.0
textblob==0.18.0
Configuration
API Keys: Provide keys during initialization (e.g., AgentOversight(openai_api_key="your-key")). For security, use environment variables:
import os
oversight = AgentOversight(openai_api_key=os.getenv("OPENAI_API_KEY"))
Contributing
- Fork the repository: GitHub Repository.
- Create a feature branch:
git checkout -b feature/your-feature. - Commit your changes:
git commit -m "Add your feature". - Push to the branch:
git push origin feature/your-feature. - Open a Pull Request.
License
MIT License - see LICENSE for details.
Contact
For questions or suggestions, reach out via GitHub Issues or email at loks2cool@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentoversight-0.1.0.tar.gz.
File metadata
- Download URL: agentoversight-0.1.0.tar.gz
- Upload date:
- Size: 9.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f5636fe8859c754ef298b1fdc735ad1bbd6091889c00d64dccc70d0931af923
|
|
| MD5 |
5519200aa7748759b42a70c226feeac0
|
|
| BLAKE2b-256 |
ae406f5882a6d10ae5d79710ac08af6fa2dedc33b679a0d16a52b7466a228e3d
|
File details
Details for the file agentoversight-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentoversight-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ac8cbc41b48fc1827546d17101d5f834af99b4bde74d82030534ba25f982338a
|
|
| MD5 |
18e4a68effcc7740d8431eba13912c8b
|
|
| BLAKE2b-256 |
6cf81d2b0db7401146ca61e7ceb81882e201d1632ceb94e85cfac7e1a480d928
|