Skip to main content

A tool to oversee and evaluate agentic AI performance with validation and guidance

Project description

AgentOversight

A modular tool to monitor, validate, and guide agentic AI performance.

AgentOversight is a Python-based platform designed to oversee AI agents. It allows users to define custom validation rules, receive directional guidance, and track performance metrics—all through a web interface or programmatically. Supporting multiple models like OpenAI, DeepSeek, and Grok, it’s perfect for developers and researchers ensuring AI reliability.

Why AgentOversight

  • Consistency: Agents might interpret rules differently across runs, leading to inconsistent validation (e.g., word counting could vary slightly).

  • Latency: Sending requests to an external AI (via API) adds network delay compared to local computation.

  • Cost : If using a paid API (e.g., OpenAI), each validation request costs money, whereas local code is free.

  • Control: You’re dependent on the agent’s capabilities and can’t easily tweak the validation logic without changing the prompt, which might not scale for a UI-driven tool.

  • Metrics: Tracking performance (e.g., response time) becomes trickier if the agent handles everything remotely.


The AgentOversight class exists as a dedicated local component for these reasons:
Independence: It decouples validation, guidance, and metrics from any specific AI agent, making the hub a standalone tool that can oversee any agent’s output (e.g., Grok, ChatGPT, or a custom model). You don’t need an AI to use it—just the output text.

Performance: Local processing is faster and doesn’t rely on external API calls, which is critical for real-time monitoring in a web app.

Customizability: You control the logic. Want to add a new rule type (e.g., max_sentences)? Just update the class—no need to rewrite prompts or rely on an agent understanding it.

Transparency: The rules and results are deterministic and visible in code, not hidden in an AI’s black-box reasoning.

Metrics Tracking: It’s easier to log and analyze metrics (e.g., response time, accuracy trends) locally with a database than to extract them from agent responses.

Why Not Just Prompt?
The hub’s goal is to oversee agents, not replace their work. If the agent itself validates its output, you lose the independent “second opinion” that an oversight tool provides.

Users might want to apply consistent rules across multiple agents or outputs without crafting prompts each time—AgentOversight makes this reusable and UI-driven.

Features

  • Custom Validation: Define rules (e.g., max_words=50) to validate agent outputs.
  • Directional Guidance: Get actionable suggestions (e.g., "Shorten the response").
  • Performance Metrics: Track response time and accuracy, logged automatically to SQLite (oversight.db).
  • Multi-Model Support: Test outputs from OpenAI, DeepSeek, Grok, and more.
  • Auto-Correction: Optionally refine outputs based on guidance with a configurable retry limit.
  • Web Interface: Built with Flask for easy interaction and visualization.

Installation Methods

Via pip

pip install agent-oversight

From Source

Clone the repository:

git clone https://github.com/LogeswaranA/AgentOversight.git
cd AgentOversight

Set up a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Usage

Web Interface

Run the Flask app:

agent-oversight

Open your browser to http://127.0.0.1:5000. Metrics are automatically logged to oversight.db in your working directory.

Programmatic Example

from AgentOversight.agent_logic import AgentOversight

# Initialize with optional API keys
oversight = AgentOversight(openai_api_key="your-openai-key")
oversight.set_rules("max_words=10,must_contain=test")

# Process input with auto-correction
result = oversight.process_input("openai", "Write a short test sentence.", auto_correct=True)
print(result)  # Metrics are logged to oversight.db automatically

Sample Output:

{
    'output': 'This is a test.',
    'validation': 'Word count: 5 (max: 10) - Valid | Contains 'test': Yes',
    'guidance': 'Looks good!',
    'metrics': {'response_time': 0.85, 'accuracy': 1.0},
    'retries': 0
}

Note: The SQLite database (oversight.db) is created and metrics are logged automatically whenever process_input or track_metrics is called. No separate initialization is required.

Supported Rules

Text Length

  • max_words=<int>: Max word count (e.g., max_words=50).
  • min_words=<int>: Min word count (e.g., min_words=10).
  • max_chars=<int>: Max character count (e.g., max_chars=200).
  • min_chars=<int>: Min character count (e.g., min_chars=20).
  • max_sentences=<int>: Max sentence count (e.g., max_sentences=3).
  • min_sentences=<int>: Min sentence count (e.g., min_sentences=1).

Content

  • must_contain=<text>: Must include text (e.g., must_contain=data).
  • must_not_contain=<text>: Must exclude text (e.g., must_not_contain=error).
  • exact_match=<text>: Must match exactly (e.g., exact_match=Hello world).
  • starts_with=<text>: Must start with text (e.g., starts_with=The).
  • ends_with=<text>: Must end with text (e.g., ends_with=.).

Structural

  • has_punctuation=<yes/no>: Check for punctuation (e.g., has_punctuation=yes).
  • has_numbers=<yes/no>: Check for numbers (e.g., has_numbers=no).
  • max_unique_words=<int>: Max unique words (e.g., max_unique_words=20).
  • min_unique_words=<int>: Min unique words (e.g., min_unique_words=5).

Advanced (OpenAI)

  • is_coherent=<yes>: Must be coherent (e.g., is_coherent=yes).
  • tone=<positive/negative/neutral>: Must match tone (e.g., tone=positive).
  • is_factual=<yes>: Must be factually plausible (e.g., is_factual=yes).
  • readability=<easy/medium/hard>: Must match readability level (e.g., readability=easy).
  • improve=<yes>: Suggest an improvement (e.g., improve=yes).

Performance

  • max_response_time=<float>: Max response time in seconds (e.g., max_response_time=1.5).

Supported Models

  • openai: GPT-3.5-turbo (requires OpenAI API key).
  • deepseek: DeepSeek R1 (requires DeepSeek API key).
  • grok: Grok 3 (requires xAI API key, placeholder until official API is available).

Requirements

Create a requirements.txt file with:

Flask==2.3.3
nltk==3.8.1
openai==1.10.0
requests==2.31.0
textblob==0.18.0

Configuration

API Keys: Provide keys during initialization (e.g., AgentOversight(openai_api_key="your-key")). For security, use environment variables:

import os
oversight = AgentOversight(openai_api_key=os.getenv("OPENAI_API_KEY"))

Contributing

  1. Fork the repository: GitHub Repository.
  2. Create a feature branch: git checkout -b feature/your-feature.
  3. Commit your changes: git commit -m "Add your feature".
  4. Push to the branch: git push origin feature/your-feature.
  5. Open a Pull Request.

License

MIT License - see LICENSE for details.

Contact

For questions or suggestions, reach out via GitHub Issues or email at loks2cool@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentoversight-0.1.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentoversight-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file agentoversight-0.1.0.tar.gz.

File metadata

  • Download URL: agentoversight-0.1.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for agentoversight-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6f5636fe8859c754ef298b1fdc735ad1bbd6091889c00d64dccc70d0931af923
MD5 5519200aa7748759b42a70c226feeac0
BLAKE2b-256 ae406f5882a6d10ae5d79710ac08af6fa2dedc33b679a0d16a52b7466a228e3d

See more details on using hashes here.

File details

Details for the file agentoversight-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agentoversight-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.1

File hashes

Hashes for agentoversight-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ac8cbc41b48fc1827546d17101d5f834af99b4bde74d82030534ba25f982338a
MD5 18e4a68effcc7740d8431eba13912c8b
BLAKE2b-256 6cf81d2b0db7401146ca61e7ceb81882e201d1632ceb94e85cfac7e1a480d928

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page