Skip to main content

Fast AI evaluator for scikit-learn models

Project description

ai-critic 🧠: The Quality Gate for Machine Learning Models

ai-critic is a specialized decision-making tool designed to audit the reliability and readiness for deployment of scikit-learn compatible Machine Learning models.

Instead of just measuring performance (accuracy, F1 score), ai-critic acts as a "Quality Gate," operating the model in search of hidden risks that can lead to production failures, such as data leaks, structural overfitting, and vulnerability to noise.


🚀 1. Getting Started (The Basics)

This section is ideal for beginners who need a quick verdict on the health of their model.

1.1. Installation

Install the library directly from PyPI:

pip install ai-critic

1.2. The Quick Verdict

With just a few lines, you can get an executive evaluation and a deployment recommendation.

from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

# 1. Prepare your data and model
X, y = make_classification(n_samples=1000, n_features=20, random_state=42)
model = RandomForestClassifier(max_depth=5, random_state=42)

# 2. Initialize Criticism
# AICritic performs all audits internally
critic = AICritic(model, X, y)

# 3. Obtain the Executive Summary
report = critic.evaluate(view="executive")

print(f"Verdict: {report['verdict']}")
print(f"Risk: {report['risk_level']}")
print(f"Reason Main: {report['main_reason']}")

#Expected Output:

# Verdict: ✅ Acceptable
# Risk: Low
# Main Reason: No critic risks detected.

💡 2. Understanding the Critique (The Intermediary)

For the data scientist who needs to understand why the model received a verdict and what the next steps are.

2.1. The Four Pillars of the Audit

The ai-critic evaluates your model across four critic dimensions.

Category Main Risk Code Module
📈 Validation Suspicious CV Scores ai_critic.performance
🧪 Robustness Noise Vulnerability ai_critic.robustness

2.2. Visual and Technical Analysis

The evaluate method allows you to view the results and access the complete technical report.

# Continuing the previous example...

# 1. Generate the full report and visualizations
# plot=True generates Correlation, Learning Curve, and Robustness graphs
full_report = critic.evaluate(view="all", plot=True)

# 2. Access the Technical Summary for Recommendations
technical_summary = full_report["technical"]

print("\n--- Technical Recommendations ---")
for i, risk in enumerate(technical_summary["key_risks"]):
print(f"Risk {i+1}: {risk}")
print(f"Recommendation: {technical_summary['recommendations'][i]}")

# Example of Risk (if there were one):
# Risk 1: The depth of the tree may be too high for the size of the dataset.

# Recommendation: Reduce model complexity or adjust hyperparameters.


###2.3. Robustness Test

A robust model should maintain its performance even with small disturbances in the data. The `ai-critic` test assesses this by injecting noise into the input data.

```python
# Accessing the specific result of the Robustness module
robustness_result = full_report["details"]["robustness"]

print("\n--- Robustness Test ---")
print(f"Original CV Score: {robustness_result['cv_score_original']:.4f}")
print(f"CV Score with Noise: {robustness_result['cv_score_noisy']:.4f}")
print(f"Performance Drop: {robustness_result['performance_drop']:.4f}")
print(f"Robustness Verdict: {robustness_result['verdict']}")

# Possible Verdicts:
# - Stable: Acceptable drop.

# - Fragile: Significant drop (risk).

# - Misleading: Original performance inflated by leakage.

⚙️ 3. Integration and Governance (The Advanced)

This section is for MLOps engineers and architects looking to integrate ai-critic into automated pipelines and create custom deployment logic.

###3.1. The Deployment Gate (deploy_decision)

The deploy_decision() method is the final control point. It returns a structured object that classifies problems into Hard Blockers (prevent deployment) and Soft Blockers (require attention, but can be accepted with reservations).

Python

Example of use in a CI/CD pipeline

decision = critic.deploy_decision()

if decision["deploy"]: print("✅ Deployment Approved. Risk Level: Low.") other: print(f"❌ Deployment Blocked. Risk Level: {decision['risk_level'].upper()}") print("Blocking Issues:") for issue in decision["blocking_issues"]: print(f"- {problem}")

The decision object also includes a heuristic confidence score (0.0 to 1.0)

print(f"Heuristic Confidence in Model: {decision['confidence']:.2f}")


###3.2. AccessFor custom *governance* rules or logic, you can access the raw data of each module through the `"details"` view.

```python
# Accessing Data Leakage Details
data_details = critic.evaluate(view="details")["data"]

if data_details["data_leakage"]["suspected"]:

print("\n--- Data Leak Alert ---")

for detail in data_details["data_leakage"]["details"]:

print(f"Feature {detail['feature_index']} with correlation of {detail['correlation']:.4f}")

# Accessing Structural Overfitting Details
config_details = critic.evaluate(view="details")["config"]

if config_details["structural_warnings"]:

print("\n--- Structural Alert ---")

for warning in config_details["structural_warnings"]:

print(f"Warning: {warning['message']} (Max Depth: {warning['max_depth']}, Recommended: {warning['recommended_max_depth']})")

3.3. Best Practices and Use Cases

Use Recommended Action
CI/CD Use deploy_decision() as an automated quality gate.
Tuning Use the technical view to guide hyperparameter optimization.
Governance Log the details view for auditing and compliance.
Communication Use the executive view to report risks to non-technical stakeholders.

📄 License

Distributed under the MIT License.

--

🧠 Final Note

ai-critic is not a benchmarking tool. It's a decision-making tool.

If a model fails here, it doesn't mean it's "bad," but rather that it shouldn't be trusted yet. The goal is to inject the necessary skepticism to build truly robust AI systems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_critic-0.2.5.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_critic-0.2.5-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file ai_critic-0.2.5.tar.gz.

File metadata

  • Download URL: ai_critic-0.2.5.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-0.2.5.tar.gz
Algorithm Hash digest
SHA256 2bebb9fcb951d325aaa882592c733e2d19f7c4ff412b578be3a1f5aeefff6626
MD5 f91a6d140f06060c2b430365fb932670
BLAKE2b-256 ad05d2ce1a562539af2b5d420a2b99256d44c2cbbb6b7a2118aaca46bc6650d4

See more details on using hashes here.

File details

Details for the file ai_critic-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: ai_critic-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 35a5b49ed7a683b29d442232e41b5c2a1251800032522b5d31b032d5e80b89d4
MD5 969f2a478e13232733c4564130df5cdd
BLAKE2b-256 d6c101fc421fd7c7c9303c532a7853cc3541ccf2f1fc62347108c07a911e5b28

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page