Add your description here
Project description
openQA
openQA is an open-source framework designed to automate the testing and evaluation of AI models. It provides a comprehensive architecture for defining audit configurations, running tests, analyzing results, and generating reports.
Traditionally, testing AI models involves complex, subjective evaluations. AI Auditor changes that by introducing a revolutionary approach - leveraging a second Large Language Model (LLM) for distance-based scoring. This innovative framework empowers you to:
- Automate AI testing: Define clear audit configurations with pre-defined and programmatic inputs/outputs for consistent evaluation.
- Achieve objective scoring: The second LLM objectively assesses the discrepancy between the AI model's output and the desired outcome, eliminating human bias.
- Gain deeper insights: Generate detailed reports highlighting identified issues, distance scores, and areas for improvement.
AI Auditor goes beyond basic testing, providing a comprehensive and reliable solution for building trust in your AI models.
Key Features
- Configurable Audits: Define audits with pre-defined and programmatic inputs/outputs for flexible testing scenarios.
- Modular Architecture : Leverage separate components for configuration management, test execution, evaluation, and reporting.
- Distance-Based Scoring : Employ a second LLM to calculate the distance between desired and actual outputs for objective scoring.
- Detailed Reports : Generate comprehensive reports summarizing audit results, identified discrepancies, and scores.
Architecture:
The core architecture of AI Auditor consists of several interacting components:
- Audit Config: This component defines the configuration for a specific audit. It includes:
- Pre-defined inputs: Specific data or prompts to be fed to the AI model under test.
- Programmatic input generation: Python or similar code to dynamically generate inputs based on specific criteria.
- Desired outputs: Expected outputs from the AI model for the provided inputs. These can be pre-defined text, data structures, or scoring criteria.
- Configuration Management: This component manages and stores audit configurations, allowing for easy creation, modification, and version control.
- Runner: This component executes the AI model under test according to the specified configuration. It provides the defined inputs to the model and captures the generated outputs.
- LLM Evaluator: This component utilizes a second Large Language Model (LLM) to compare the AI model's outputs with the desired outputs from the configuration. It calculates a distance score based on the closeness of the outputs, indicating potential discrepancies.
- Reporting: This component generates comprehensive reports summarizing the audit results. It includes:
- Tested AI model and configuration details.
- Pre-defined and programmatic inputs used.
- Desired outputs from the configuration.
- Actual outputs generated by the AI model under test.
- Distance score calculated by the LLM Evaluator.
- Identified discrepancies or areas for improvement.
Benefits:
- Standardizes and simplifies AI model testing procedures.
- Enables objective and consistent evaluation through distance-based scoring.
- Improves transparency and interpretability of AI model behavior.
- Generates detailed reports for informed decision-making.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file openqa-0.1.0.tar.gz
.
File metadata
- Download URL: openqa-0.1.0.tar.gz
- Upload date:
- Size: 34.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | becbbdba858b51b010514ffd13fe58e9d7af7ada5f168d6c5d7adefb9dc7d056 |
|
MD5 | bbcf7c5e2d87dae870e57cc607c6ddf0 |
|
BLAKE2b-256 | 91937dc73ea44f41b037289e8f5e8839819e92f3bb1e2cad71c2dc3fbe70e797 |
File details
Details for the file openqa-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: openqa-0.1.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2a359cd4de18ca20eab623ff443c20488d3225f4ca2d3c01eea38e0b6c610588 |
|
MD5 | 7cbfe0adf47f9bf9a980f4e9b2f5a2d0 |
|
BLAKE2b-256 | e4fd492c461fa6378890aa395528dd5dbd914512279992ca0d43c2dbee6a4e01 |