Graph-based evaluation engine for machine learning models
Project description
ai-critic 3.2.0
pip install ai-critic
Latest version
Released: 2026
AI Critic — Evaluation Graph Engine for Machine Learning.
AI Critic
AI Critic is a graph-based evaluation engine for machine learning models.
Instead of isolated metrics, AI Critic runs a structured evaluation pipeline that analyzes multiple dimensions of model quality such as:
- performance
- robustness
- explainability
- dataset risks
- overfitting signals
The system produces structured diagnostics and a unified score.
AI Critic is designed to be:
- deterministic
- extensible
- plugin-driven
- CI friendly
No telemetry.
No AutoML.
Just structured model evaluation.
Why AI Critic?
Typical ML evaluation focuses only on metrics like accuracy.
But real-world models fail for many reasons:
- data leakage
- class imbalance
- unstable features
- fragile robustness
AI Critic evaluates these risks in a structured pipeline.
Key Features
Model Audit
Run a full diagnostic of your model and dataset.
from ai_critic.audit import audit
report = audit(model, X, y)
Audit detects:
- dataset size issues
- class imbalance
- potential overfitting
- suspiciously perfect validation scores
Example output:
{
"dataset_checks": {...},
"model_checks": {...},
"scores": {...}
}
Benchmark Multiple Models
Compare several models on the same dataset.
from ai_critic.benchmark import benchmark
results = benchmark(
models=[model1, model2, model3],
X=X,
y=y
)
Output:
RandomForestClassifier score: 0.91
SVC score: 0.87
LogisticRegression score: 0.82
Evaluation Graph Engine
AI Critic executes evaluations using a structured Evaluation Graph.
Each evaluator is a node:
- PerformanceEvaluator
- RobustnessEvaluator
- ExplainabilityEvaluator
Nodes:
- run independently
- produce structured output
- return normalized scores
- can declare dependencies
The graph aggregates them into a final score.
Graph Visualization
Visualize the evaluation pipeline.
critic.visualize()
Generates a graph representation of the evaluation pipeline.
Example structure:
performance
↓
robustness
↓
explainability
Cross-Validation Intelligence
Automatically selects validation strategy:
- StratifiedKFold for classification
- KFold for regression
Reports:
- CV mean score
- standard deviation
- suspiciously perfect scores
Robustness Testing
Tests model stability under controlled noise injection.
Reports:
- performance degradation
- stability classification
- robustness score
Explainability Signal
Uses permutation sensitivity analysis to estimate feature importance behavior.
Detects:
- shortcut learning
- unstable features
- potential leakage signals
Quick Start
from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target
model = RandomForestClassifier()
model.fit(X, y)
critic = AICritic()
report = critic.evaluate(model, X, y)
print(report["scores"])
Output:
{
"overall": 0.87,
"verdict": "good"
}
Output Structure
{
"scores": {
"overall": 0.83,
"verdict": "good"
},
"details": {
"performance": {...},
"robustness": {...},
"explainability": {...}
}
}
Each evaluator returns diagnostic metadata.
Plugin System
AI Critic 3.0 introduces a plugin architecture.
You can create custom evaluators.
Example:
from ai_critic.plugins.base import EvaluatorPlugin
class FairnessEvaluator(EvaluatorPlugin):
name = "fairness"
def evaluate(self, model, dataset, context):
return {
"score": 0.9,
"message": "Model fairness acceptable"
}
Register the plugin:
from ai_critic.plugins.registry import EvaluatorRegistry
EvaluatorRegistry.register(FairnessEvaluator())
AI Critic will automatically include it in the evaluation pipeline.
CLI Usage
ai-critic --model model.pkl --data dataset.csv --target label
Output:
=== AI CRITIC REPORT ===
Overall score: 0.812
Verdict: good
JSON mode:
ai-critic --json
Installation
pip install ai-critic
Dependencies:
- numpy
- scikit-learn
- graphviz (optional for visualization)
Design Philosophy
AI Critic follows three principles:
- Deterministic evaluation
- Modular architecture
- No hidden ML layers
AI Critic is not:
- an AutoML tool
- a model trainer
- a black box evaluator
It is an evaluation engine.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_critic-3.2.0.tar.gz.
File metadata
- Download URL: ai_critic-3.2.0.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0519924407d334e2cee9d0ce1ccf2e219c309b59819b0b4bffaccc9ac25c2e3
|
|
| MD5 |
60b09377091fd4009cf6682e51aee8e8
|
|
| BLAKE2b-256 |
a1d6e2360b89c18cb2d8fff331cdb728c28bc6a76b6e07c8c3056b31aa2b76b8
|
File details
Details for the file ai_critic-3.2.0-py3-none-any.whl.
File metadata
- Download URL: ai_critic-3.2.0-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb08525536c5bff536e5b9a8d70280ca9570ef34a1c7899ed5fa3ee8a3735ee4
|
|
| MD5 |
ed0df486931f61209874ab5259dca7f0
|
|
| BLAKE2b-256 |
91fe0309c15ab145a259c723caf31fcb00f12ef38b30487604c4b6bd4f91900a
|