Graph-based evaluation engine for machine learning models
Project description
🚀 AI Critic 3.5.0 (Production Readiness Engine)
pip install ai-critic
AI Critic is a graph-based evaluation engine for machine learning models, designed to go beyond isolated metrics.
It runs a structured evaluation pipeline that analyzes multiple dimensions — performance, robustness, explainability, data quality, and structure — delivering a unified, interpretable, and actionable report.
🔥 WHAT’S NEW IN 3.5.0
🧠 Production-First Design
- One-line evaluation:
evaluate() - Simplified API for fast adoption
- Built for real-world deployment decisions
⚡ Standard Usage (NEW)
AI Critic is now designed to be used right after training:
import ai_critic
report = ai_critic.evaluate(model, X, y)
🚫 Quality Gate (NEW — CRITICAL)
Turn evaluation into a deployment decision:
from ai_critic import evaluate
from ai_critic.gate import enforce
report = evaluate(model, X, y)
enforce(report, threshold=75)
If the model is not good enough → deployment is blocked.
📦 Standardized Report (JSON-first)
All results follow the same schema:
report = {
"scores": {}, # technical scores (0–1)
"details": {}, # raw evaluator outputs
"risk": {}, # interpretable score (0–100)
"summary": {}, # human-readable insights
"suggestions": [] # recommended actions
}
👉 This makes AI Critic:
- API-ready
- Easy to log and persist
- Production-ready
⚡ Improved Graph Engine
- Dependency-aware execution (topological sort)
- Parallel execution support
- Deterministic evaluation order
🎯 Multi-layer Scoring System
- Technical score (0–1) → aggregation layer
- Risk score (0–100) → decision layer
💡 Integrated Suggestion Engine
- Automatically generates recommendations based on model behavior
🧩 Plugin System
- Clean evaluator interface
- Dependency-aware plugins
- Easily extensible evaluation pipeline
⚡ QUICK START
🧠 One-liner (recommended)
import ai_critic
report = ai_critic.evaluate(model, X, y)
print(report["risk"])
print(report["summary"])
🔐 Production usage (recommended)
from ai_critic import evaluate
from ai_critic.gate import enforce
report = evaluate(model, X, y)
# 🚫 blocks bad models
enforce(report, threshold=75)
🧪 Full control (advanced)
from api.client import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target
model = RandomForestClassifier().fit(X, y)
critic = AICritic(weights={
"performance": 1.0,
"robustness": 1.5
})
report = critic.evaluate(model, X, y, parallel=True)
🧩 INTERNAL PIPELINE
evaluate()
↓
EvaluationGraph (nodes)
↓
raw_results
↓
ScoreAggregator (0–1)
↓
build_report()
↓
scoring.py (risk 0–100)
↓
summary.py (human-readable)
↓
SuggestionEngine
🧱 CORE COMPONENTS
1. Evaluation Graph
A DAG-based execution system:
- Automatically resolves dependencies
- Executes nodes in correct order
- Enables parallel execution
Example:
performance → robustness → explainability
2. Score Aggregator
Combines evaluator outputs:
critic = AICritic(weights={
"performance": 1.0,
"robustness": 2.0
})
3. Evaluator Plugins
Fully extensible via plugins:
from ai_critic.plugins.base import EvaluatorPlugin
from ai_critic.plugins.registry import EvaluatorRegistry
class FairnessEvaluator(EvaluatorPlugin):
name = "fairness"
dependencies = ["performance"]
weight = 1.0
def evaluate(self, model, dataset, context=None):
return {
"score": 0.92,
"verdict": "stable",
"message": "Fairness is acceptable"
}
EvaluatorRegistry.register(FairnessEvaluator())
4. Risk Scoring (0–100)
Transforms technical signals into decision-ready output:
report["risk"] = {
"global_score": 78.5,
"verdict": "usable_with_caution",
"component_scores": {...},
"penalties": [...]
}
5. Human Summary
High-level interpretation:
report["summary"] = {
"executive_summary": {
"verdict": "⚠️ Risky",
"deploy_recommended": False
}
}
6. Suggestion Engine
Actionable insights:
[
"Check for data leakage",
"Improve robustness with regularization"
]
🖥️ CLI
Run directly from terminal:
ai-critic --model model.pkl --data dataset.csv --target label
🔥 CI/CD Mode (recommended)
ai-critic --model model.pkl --data dataset.csv --target label --fail-on-risk
👉 Fails automatically if model risk is too high.
🧠 DESIGN PHILOSOPHY
1. Single Source of Truth
One unified data format → no inconsistencies
2. Graph-first Thinking
Evaluation is a dependency-driven pipeline, not isolated functions
3. JSON-native
Everything is ready for:
- APIs
- dashboards
- logging
- SaaS platforms
4. Actionable AI
Not just metrics — decisions:
- Should you deploy?
- Where is the risk?
- What should be improved?
🔥 POSITIONING
AI Critic is not just a metrics library.
It is a:
🧠 Production gatekeeper for machine learning models
🚀 ROADMAP
- REST API (
/evaluate) - Visual dashboard
- Model monitoring (post-deployment)
- Continuous evaluation (CI/CD)
- Global benchmarking between models
📄 LICENSE
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_critic-3.5.0.tar.gz.
File metadata
- Download URL: ai_critic-3.5.0.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c30985756fa416bd47c8fd0211a0429dd9e52e29c3e143dc90ba61c3abbfe38
|
|
| MD5 |
bdc03a5c205f628f9251c1f5c56abf54
|
|
| BLAKE2b-256 |
3be11297cecfca80d120a486fe3852bc359b4750de66f98abd17fdaaa5fe319a
|
File details
Details for the file ai_critic-3.5.0-py3-none-any.whl.
File metadata
- Download URL: ai_critic-3.5.0-py3-none-any.whl
- Upload date:
- Size: 29.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1ccdcd3b740c6a652a2417ab9de0a3f1dd75cefa5efa9e21eb770f532317664
|
|
| MD5 |
44e234ced832dcc06df6255abc9779d4
|
|
| BLAKE2b-256 |
ceb8793697695ff2255757446532d9a979b3f0aae9ba3773a5fcd3f3be9c9476
|