Fast AI evaluator for scikit-learn models
Project description
ai-critic 🧠
The Quality Gate for Machine Learning Models
ai-critic is an intelligent evaluation and decision system designed to determine whether a machine learning model is safe, reliable, and trustworthy enough to be deployed in real-world environments.
Unlike traditional ML evaluation tools that focus almost exclusively on performance metrics, ai-critic acts as a Quality Gate — a final checkpoint that actively probes models to uncover hidden risks that frequently cause silent failures in production.
ai-critic does not ask “How accurate is this model?” It asks “Can this model be trusted in the real world?”
🎯 Why ai-critic Exists
Most production ML failures are not accuracy problems.
They are caused by:
- Data leakage hidden inside features
- Overfitting disguised as strong validation scores
- Models that collapse under small noise
- Fragile dependency on a single feature
- Structurally unsafe configurations
These failures usually appear after deployment, when they are already expensive — or dangerous — to fix.
ai-critic exists to detect these risks before deployment.
🚀 Installation
Install directly from PyPI:
pip install ai-critic
Python 3.8+ is recommended.
⚡ Quick Start (Fast Verdict)
If you want a clear, conservative deployment recommendation, this is all you need.
from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
X, y = make_classification(
n_samples=1000,
n_features=20,
random_state=42
)
model = RandomForestClassifier(
max_depth=5,
random_state=42
)
critic = AICritic(model, X, y)
report = critic.evaluate(view="executive")
print(report)
Example Output
Verdict: ⚠️ Risk Detected
Risk Level: medium
Deploy Recommended: False
Main Reason: Structural or robustness risks detected
If ai-critic approves deployment, it means no meaningful risks were detected by multiple independent checks.
The system is intentionally skeptical by design.
🧭 What Does the Verdict Mean?
| Field | Meaning |
|---|---|
verdict |
Human-readable summary |
risk_level |
low / medium / high |
deploy_recommended |
Final quality gate decision |
main_reason |
Primary blocking factor |
Clarity is prioritized over ambiguity.
🧠 How ai-critic Thinks (Core Concept)
ai-critic is not a metric calculator. It is a decision system.
Internally, it works in three layers:
- Evaluators → Detect signals and risks
- Critic Gate → Decide if intervention is needed
- Deployment Policy → Decide if deployment is safe
🧱 The Four Pillars of the Audit
ai-critic evaluates models across four independent risk dimensions:
| Pillar | Detects | Why It Matters |
|---|---|---|
| 📊 Data Integrity | Leakage, shortcuts, correlations | Inflated metrics |
| 🧠 Model Structure | Over-complexity, unsafe configs | Poor generalization |
| 📈 Performance Sanity | Suspicious CV behavior | False confidence |
| 🧪 Robustness | Noise sensitivity | Production collapse |
Each pillar emits signals, not binary judgments.
Those signals are aggregated by the Critic Gate.
🧪 Robustness Testing (Noise Injection)
Production data is never clean.
ai-critic injects controlled noise into inputs and measures degradation:
robustness = report["details"]["robustness"]
print(robustness["performance_drop"])
print(robustness["verdict"])
Possible outcomes:
stable→ acceptable degradationfragile→ high sensitivitymisleading→ likely inflated performance
🔍 Explainability & Feature Sensitivity
ai-critic performs feature sensitivity analysis to detect:
- Feature-level leakage
- Over-reliance on a single signal
- Shortcut learning
How it works:
- A feature is perturbed or permuted
- The model is re-evaluated
- Performance drop is measured
Large drops indicate critical dependency.
This approach is:
- Model-agnostic
- Lightweight
- Interpretable
- Framework-independent
🧠 Recommendations Engine
ai-critic does not stop at “deploy or not”.
It generates actionable recommendations, such as:
- Reduce model complexity
- Increase regularization
- Possible data leakage detected
- High noise sensitivity
- Structural overfitting signals
These recommendations are rule-based and data-driven, not LLM hallucinations.
🚦 Deployment Decision
The final decision is produced via:
decision = critic.deploy_decision()
print(decision)
Output includes:
- Deployment approval or rejection
- Risk level
- ML confidence score
- Blocking issues
- Recommendations
🧠 Critic Gate (New)
The Critic Gate decides whether suggestions should even be made.
This prevents:
- Over-criticism
- Noise-based warnings
- Fatigue from excessive suggestions
The gate considers:
- Overall score
- Dataset size
- Verdict severity
- Structural risk
This turns ai-critic into a judgment system, not a nagging tool.
🔄 Feedback Loop & Learning Critic
ai-critic can learn from outcomes.
You can optionally provide feedback:
ai-critic --feedback success
This enables:
- Smarter future decisions
- Better thresholds
- Context-aware criticism
The critic improves without exposing your data.
🖥️ Command Line Interface (CLI)
ai-critic ships with a professional CLI:
ai-critic \
--model model.pkl \
--data dataset.csv \
--target label
CLI output includes:
- Gate decision
- Deployment recommendation
- Risk level
- Suggestions
Use --json for automation and pipelines.
🧩 Multi-Framework Support
Supported via adapters:
- scikit-learn
- PyTorch
- TensorFlow
The API remains consistent.
🛡️ What ai-critic Is NOT
- ❌ A hyperparameter optimizer
- ❌ A leaderboard benchmarking tool
- ❌ A replacement for domain expertise
- ❌ A blind approval system
🧠 Design Philosophy
ai-critic assumes:
- Metrics can lie
- Data is imperfect
- Models fail silently
- Trust must be earned
That makes it ideal as a final quality gate, not a tuning toy.
🧠 Final Note
ai-critic is not here to make models look good. It exists to prevent unsafe models from looking good enough to deploy.
A failed audit does not mean your model is bad. It means your model is not yet safe to trust.
That distinction is everything.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_critic-2.1.0.tar.gz.
File metadata
- Download URL: ai_critic-2.1.0.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
545c537643bfc63641cdc3e717f02e7676c6d58143fea9a39425fb88434ca067
|
|
| MD5 |
01e0ccb6e53c06831e0192e27aaf7071
|
|
| BLAKE2b-256 |
ec4a5bf03efc4553f71fada55cfd55ca5607007112d1cc755413459257540a19
|
File details
Details for the file ai_critic-2.1.0-py3-none-any.whl.
File metadata
- Download URL: ai_critic-2.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
789801e137861986ad6b3535063ae780065a47e7121074bc98f67e4dd9a740ab
|
|
| MD5 |
f898ebfe129c63b299cbbc32852ce5a1
|
|
| BLAKE2b-256 |
06b07d1116a23829ea69681c80617510bfe5a89c45ad72053d6441f7658b7c91
|