Fast AI evaluator for scikit-learn models
Project description
ai-critic 🧠: The Quality Gate for Machine Learning Models
ai-critic is a specialized decision-making tool designed to audit the reliability and readiness for deployment of scikit-learn compatible Machine Learning models.
Instead of just measuring performance (accuracy, F1 score), ai-critic acts as a "Quality Gate," operating the model in search of hidden risks that can lead to production failures, such as data leaks, structural overfitting, and vulnerability to noise.
🚀 1. Getting Started (The Basics)
This section is ideal for beginners who need a quick verdict on the health of their model.
1.1. Installation
Install the library directly from PyPI:
pip install ai-critic
1.2. The Quick Verdict
With just a few lines, you can get an executive evaluation and a deployment recommendation.
from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
# 1. Prepare your data and model
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
model = RandomForestClassifier(max_depth=5, random_state=42)
# 2. Initialize Criticism
# AICritic performs all audits internally
critic = AICritic(model, X, y)
# 3. Obtain the Executive Summary
report = critic.evaluate(view="executive")
print(f"Verdict: {report['verdict']}")
print(f"Risk: {report['risk_level']}")
print(f"Reason Main: {report['main_reason']}")
#Expected Output:
# Verdict: ✅ Acceptable
# Risk: Low
# Main Reason: No critic risks detected.
💡 2. Understanding the Critique (The Intermediary)
For the data scientist who needs to understand why the model received a verdict and what the next steps are.
2.1. The Four Pillars of the Audit
The ai-critic evaluates your model across four critic dimensions.
| Category | Main Risk | Code Module |
|---|---|---|
| 📈 Validation | Suspicious CV Scores | ai_critic.performance |
| 🧪 Robustness | Noise Vulnerability | ai_critic.robustness |
2.2. Visual and Technical Analysis
The evaluate method allows you to view the results and access the complete technical report.
# Continuing the previous example...
# 1. Generate the full report and visualizations
# plot=True generates Correlation, Learning Curve, and Robustness graphs
full_report = critic.evaluate(view="all", plot=True)
# 2. Access the Technical Summary for Recommendations
technical_summary = full_report["technical"]
print("\n--- Technical Recommendations ---")
for i, risk in enumerate(technical_summary["key_risks"]):
print(f"Risk {i+1}: {risk}")
print(f"Recommendation: {technical_summary['recommendations'][i]}")
# Example of Risk (if there were one):
# Risk 1: The depth of the tree may be too high for the size of the dataset.
# Recommendation: Reduce model complexity or adjust hyperparameters.
###2.3. Robustness Test
A robust model should maintain its performance even with small disturbances in the data. The `ai-critic` test assesses this by injecting noise into the input data.
```python
# Accessing the specific result of the Robustness module
robustness_result = full_report["details"]["robustness"]
print("\n--- Robustness Test ---")
print(f"Original CV Score: {robustness_result['cv_score_original']:.4f}")
print(f"CV Score with Noise: {robustness_result['cv_score_noisy']:.4f}")
print(f"Performance Drop: {robustness_result['performance_drop']:.4f}")
print(f"Robustness Verdict: {robustness_result['verdict']}")
# Possible Verdicts:
# - Stable: Acceptable drop.
# - Fragile: Significant drop (risk).
# - Misleading: Original performance inflated by leakage.
⚙️ 3. Integration and Governance (The Advanced)
This section is for MLOps engineers and architects looking to integrate ai-critic into automated pipelines and create custom deployment logic.
###3.1. The Deployment Gate (deploy_decision)
The deploy_decision() method is the final control point. It returns a structured object that classifies problems into Hard Blockers (prevent deployment) and Soft Blockers (require attention, but can be accepted with reservations).
Python
Example of use in a CI/CD pipeline
decision = critic.deploy_decision()
if decision["deploy"]: print("✅ Deployment Approved. Risk Level: Low.") other: print(f"❌ Deployment Blocked. Risk Level: {decision['risk_level'].upper()}") print("Blocking Issues:") for issue in decision["blocking_issues"]: print(f"- {problem}")
The decision object also includes a heuristic confidence score (0.0 to 1.0)
print(f"Heuristic Confidence in Model: {decision['confidence']:.2f}")
###3.2. AccessFor custom *governance* rules or logic, you can access the raw data of each module through the `"details"` view.
```python
# Accessing Data Leakage Details
data_details = critic.evaluate(view="details")["data"]
if data_details["data_leakage"]["suspected"]:
print("\n--- Data Leak Alert ---")
for detail in data_details["data_leakage"]["details"]:
print(f"Feature {detail['feature_index']} with correlation of {detail['correlation']:.4f}")
# Accessing Structural Overfitting Details
config_details = critic.evaluate(view="details")["config"]
if config_details["structural_warnings"]:
print("\n--- Structural Alert ---")
for warning in config_details["structural_warnings"]:
print(f"Warning: {warning['message']} (Max Depth: {warning['max_depth']}, Recommended: {warning['recommended_max_depth']})")
3.3. Best Practices and Use Cases
| Use | Recommended Action |
|---|---|
| CI/CD | Use deploy_decision() as an automated quality gate. |
| Tuning | Use the technical view to guide hyperparameter optimization. |
| Governance | Log the details view for auditing and compliance. |
| Communication | Use the executive view to report risks to non-technical stakeholders. |
📄 License
Distributed under the MIT License.
--
🧠 Final Note
ai-critic is not a benchmarking tool. It's a decision-making tool.
If a model fails here, it doesn't mean it's "bad," but rather that it shouldn't be trusted yet. The goal is to inject the necessary skepticism to build truly robust AI systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_critic-0.2.5.tar.gz.
File metadata
- Download URL: ai_critic-0.2.5.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2bebb9fcb951d325aaa882592c733e2d19f7c4ff412b578be3a1f5aeefff6626
|
|
| MD5 |
f91a6d140f06060c2b430365fb932670
|
|
| BLAKE2b-256 |
ad05d2ce1a562539af2b5d420a2b99256d44c2cbbb6b7a2118aaca46bc6650d4
|
File details
Details for the file ai_critic-0.2.5-py3-none-any.whl.
File metadata
- Download URL: ai_critic-0.2.5-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35a5b49ed7a683b29d442232e41b5c2a1251800032522b5d31b032d5e80b89d4
|
|
| MD5 |
969f2a478e13232733c4564130df5cdd
|
|
| BLAKE2b-256 |
d6c101fc421fd7c7c9303c532a7853cc3541ccf2f1fc62347108c07a911e5b28
|