Fast AI evaluator for scikit-learn models
Project description
ai-critic 🧠: The Quality Gate for Machine Learning Models
ai-critic is a specialized decision-making tool designed to audit the reliability and readiness for deployment of scikit-learn–compatible Machine Learning models.
Instead of merely measuring performance (accuracy, F1 score), ai-critic acts as a Quality Gate, actively probing the model to uncover hidden risks that commonly cause production failures — such as data leakage, structural overfitting, and fragility under noise.
ai-critic does not ask “How good is this model?” It asks “Can this model be trusted?”
🚀 Getting Started (The Basics)
This section is ideal for beginners who need a fast and reliable verdict on a trained model.
Installation
Install directly from PyPI:
pip install ai-critic
The Quick Verdict
With just a few lines of code, you obtain an executive-level assessment and a deployment recommendation.
from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# 1. Prepare data and model
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
model = RandomForestClassifier(max_depth=5, random_state=42)
# 2. Initialize the Critic
critic = AICritic(model, X, y)
# 3. Run the audit (executive mode)
report = critic.evaluate(view="executive")
print(f"Verdict: {report['verdict']}")
print(f"Risk Level: {report['risk_level']}")
print(f"Main Reason: {report['main_reason']}")
Expected Output (example):
Verdict: ⚠️ Risky
Risk Level: medium
Main Reason: Structural or robustness-related risks detected.
This output is intentionally conservative. If ai-critic recommends deployment, it means meaningful risks were not detected.
💡 Understanding the Critique (The Intermediary)
For data scientists who want to understand why the model received a given verdict and how to improve it.
The Four Pillars of the Audit
ai-critic evaluates models across four independent risk dimensions:
| Pillar | Main Risk Detected | Internal Module |
|---|---|---|
| 📊 Data Integrity | Target Leakage & Correlation Artifacts | evaluators.data |
| 🧠 Model Structure | Over-complexity & Misconfiguration | evaluators.config |
| 📈 Performance | Suspicious CV or Learning Curves | evaluators.performance |
| 🧪 Robustness | Sensitivity to Noise | evaluators.robustness |
Each pillar contributes signals used later in the deployment gate.
Full Technical & Visual Analysis
To access all internal diagnostics, including plots and recommendations, use view="all".
full_report = critic.evaluate(view="all", plot=True)
technical_summary = full_report["technical"]
print("\n--- Key Risks Detected ---")
for i, risk in enumerate(technical_summary["key_risks"], start=1):
print(f"{i}. {risk}")
print("\n--- Recommendations ---")
for rec in technical_summary["recommendations"]:
print(f"- {rec}")
Generated plots may include:
- Feature correlation heatmaps
- Learning curves
- Robustness degradation charts
Robustness Test (Noise Injection)
A model that collapses under small perturbations is not production-safe.
robustness = full_report["details"]["robustness"]
print("\n--- Robustness Analysis ---")
print(f"Original CV Score: {robustness['cv_score_original']:.4f}")
print(f"Noisy CV Score: {robustness['cv_score_noisy']:.4f}")
print(f"Performance Drop: {robustness['performance_drop']:.4f}")
print(f"Verdict: {robustness['verdict']}")
Possible Verdicts:
stable→ acceptable degradationfragile→ high sensitivity to noisemisleading→ performance likely inflated by leakage
⚙️ Integration and Governance (The Advanced)
This section targets MLOps engineers, architects, and teams operating automated pipelines.
The Deployment Gate (deploy_decision)
The deploy_decision() method aggregates all detected risks and produces a final gate decision.
decision = critic.deploy_decision()
if decision["deploy"]:
print("✅ Deployment Approved")
else:
print("❌ Deployment Blocked")
print(f"Risk Level: {decision['risk_level']}")
print(f"Confidence Score: {decision['confidence']:.2f}")
print("\nBlocking Issues:")
for issue in decision["blocking_issues"]:
print(f"- {issue}")
Conceptual model:
- Hard Blockers → deployment denied
- Soft Blockers → deployment discouraged
- Confidence Score (0–1) → heuristic trust level
Modes & Views (API Design)
The evaluate() method supports multiple modes via the view parameter:
| View | Description |
|---|---|
"executive" |
High-level verdict (non-technical) |
"technical" |
Risks & recommendations |
"details" |
Raw evaluator outputs |
"all" |
Complete payload |
Example:
critic.evaluate(view="technical")
critic.evaluate(view=["executive", "performance"])
Session Tracking & Model Comparison (New in 1.0.0)
You can persist evaluations and compare model versions over time.
critic_v1 = AICritic(model, X, y, session="v1")
critic_v1.evaluate()
critic_v2 = AICritic(model, X, y, session="v2")
critic_v2.evaluate()
comparison = critic_v2.compare_with("v1")
print(comparison["score_diff"])
This enables:
- Regression tracking
- Risk drift detection
- Governance & audit trails
Best Practices & Use Cases
| Scenario | Recommended Usage |
|---|---|
| CI/CD | Block merges using deploy_decision() |
| Model Tuning | Use technical view for guidance |
| Governance | Persist session outputs |
| Stakeholder Reports | Share executive summaries |
🔒 API Stability
Starting from version 1.0.0, the public API of ai-critic follows semantic versioning. Breaking changes will only occur in major releases.
📄 License
Distributed under the MIT License.
🧠 Final Note
ai-critic is not a benchmarking tool. It is a decision-making system.
A failed audit does not mean the model is bad — it means the model is not ready to be trusted.
The purpose of ai-critic is to introduce structured skepticism into machine learning workflows — exactly where it belongs.
Se quiser, próximo passo posso:
- gerar o CHANGELOG.md oficial do 1.0.0
- revisar esse README como um reviewer externo
- escrever o post de lançamento (GitHub / PyPI / Reddit)
Esse README já está em nível profissional real.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_critic-1.0.0.tar.gz.
File metadata
- Download URL: ai_critic-1.0.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eb0873eaf0104b35e26783b692e6ff798c60ec89722d519d2f95532c062a4e4
|
|
| MD5 |
66eede651d68f234beda1063bba7d150
|
|
| BLAKE2b-256 |
9e48be2de8c3d50221d8015d85235ce3562b9d52063a1e53c43a3554d99f8c82
|
File details
Details for the file ai_critic-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ai_critic-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1a185bfd43e5b6acbb820308fffc14094d28e22d312fac1e192dd266d53d453
|
|
| MD5 |
1f749ee629853059a3c0015e4b79a0bc
|
|
| BLAKE2b-256 |
d3011534b355caad0a1a4af7811fc651dd3efaa3c1ffe41dee75fa68e8ece205
|