Skip to main content

Fast AI evaluator for scikit-learn models

Project description

ai-critic 🧠

The Quality Gate for Machine Learning Models

ai-critic is an intelligent evaluation and decision system designed to determine whether a machine learning model is safe, reliable, and trustworthy enough to be deployed in real-world environments.

Unlike traditional ML evaluation tools that focus almost exclusively on performance metrics, ai-critic acts as a Quality Gate — a final checkpoint that actively probes models to uncover hidden risks that frequently cause silent failures in production.

ai-critic does not ask “How accurate is this model?” It asks “Can this model be trusted in the real world?”


🎯 Why ai-critic Exists

Most production ML failures are not accuracy problems.

They are caused by:

  • Data leakage hidden inside features
  • Overfitting disguised as strong validation scores
  • Models that collapse under small noise
  • Fragile dependency on a single feature
  • Structurally unsafe configurations

These failures usually appear after deployment, when they are already expensive — or dangerous — to fix.

ai-critic exists to detect these risks before deployment.


🚀 Installation

Install directly from PyPI:

pip install ai-critic

Python 3.8+ is recommended.


⚡ Quick Start (Fast Verdict)

If you want a clear, conservative deployment recommendation, this is all you need.

from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification

X, y = make_classification(
    n_samples=1000,
    n_features=20,
    random_state=42
)

model = RandomForestClassifier(
    max_depth=5,
    random_state=42
)

critic = AICritic(model, X, y)

report = critic.evaluate(view="executive")

print(report)

Example Output

Verdict: ⚠️ Risk Detected
Risk Level: medium
Deploy Recommended: False
Main Reason: Structural or robustness risks detected

If ai-critic approves deployment, it means no meaningful risks were detected by multiple independent checks.

The system is intentionally skeptical by design.


🧭 What Does the Verdict Mean?

Field Meaning
verdict Human-readable summary
risk_level low / medium / high
deploy_recommended Final quality gate decision
main_reason Primary blocking factor

Clarity is prioritized over ambiguity.


🧠 How ai-critic Thinks (Core Concept)

ai-critic is not a metric calculator. It is a decision system.

Internally, it works in three layers:

  1. Evaluators → Detect signals and risks
  2. Critic Gate → Decide if intervention is needed
  3. Deployment Policy → Decide if deployment is safe

🧱 The Four Pillars of the Audit

ai-critic evaluates models across four independent risk dimensions:

Pillar Detects Why It Matters
📊 Data Integrity Leakage, shortcuts, correlations Inflated metrics
🧠 Model Structure Over-complexity, unsafe configs Poor generalization
📈 Performance Sanity Suspicious CV behavior False confidence
🧪 Robustness Noise sensitivity Production collapse

Each pillar emits signals, not binary judgments.

Those signals are aggregated by the Critic Gate.


🧪 Robustness Testing (Noise Injection)

Production data is never clean.

ai-critic injects controlled noise into inputs and measures degradation:

robustness = report["details"]["robustness"]

print(robustness["performance_drop"])
print(robustness["verdict"])

Possible outcomes:

  • stable → acceptable degradation
  • fragile → high sensitivity
  • misleading → likely inflated performance

🔍 Explainability & Feature Sensitivity

ai-critic performs feature sensitivity analysis to detect:

  • Feature-level leakage
  • Over-reliance on a single signal
  • Shortcut learning

How it works:

  1. A feature is perturbed or permuted
  2. The model is re-evaluated
  3. Performance drop is measured

Large drops indicate critical dependency.

This approach is:

  • Model-agnostic
  • Lightweight
  • Interpretable
  • Framework-independent

🧠 Recommendations Engine

ai-critic does not stop at “deploy or not”.

It generates actionable recommendations, such as:

  • Reduce model complexity
  • Increase regularization
  • Possible data leakage detected
  • High noise sensitivity
  • Structural overfitting signals

These recommendations are rule-based and data-driven, not LLM hallucinations.


🚦 Deployment Decision

The final decision is produced via:

decision = critic.deploy_decision()

print(decision)

Output includes:

  • Deployment approval or rejection
  • Risk level
  • ML confidence score
  • Blocking issues
  • Recommendations

🧠 Critic Gate (New)

The Critic Gate decides whether suggestions should even be made.

This prevents:

  • Over-criticism
  • Noise-based warnings
  • Fatigue from excessive suggestions

The gate considers:

  • Overall score
  • Dataset size
  • Verdict severity
  • Structural risk

This turns ai-critic into a judgment system, not a nagging tool.


🔄 Feedback Loop & Learning Critic

ai-critic can learn from outcomes.

You can optionally provide feedback:

ai-critic --feedback success

This enables:

  • Smarter future decisions
  • Better thresholds
  • Context-aware criticism

The critic improves without exposing your data.


🖥️ Command Line Interface (CLI)

ai-critic ships with a professional CLI:

ai-critic \
  --model model.pkl \
  --data dataset.csv \
  --target label

CLI output includes:

  • Gate decision
  • Deployment recommendation
  • Risk level
  • Suggestions

Use --json for automation and pipelines.


🧩 Multi-Framework Support

Supported via adapters:

  • scikit-learn
  • PyTorch
  • TensorFlow

The API remains consistent.


🛡️ What ai-critic Is NOT

  • ❌ A hyperparameter optimizer
  • ❌ A leaderboard benchmarking tool
  • ❌ A replacement for domain expertise
  • ❌ A blind approval system

🧠 Design Philosophy

ai-critic assumes:

  • Metrics can lie
  • Data is imperfect
  • Models fail silently
  • Trust must be earned

That makes it ideal as a final quality gate, not a tuning toy.


🧠 Final Note

ai-critic is not here to make models look good. It exists to prevent unsafe models from looking good enough to deploy.

A failed audit does not mean your model is bad. It means your model is not yet safe to trust.

That distinction is everything.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_critic-2.1.0.tar.gz (21.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_critic-2.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded Python 3

File details

Details for the file ai_critic-2.1.0.tar.gz.

File metadata

  • Download URL: ai_critic-2.1.0.tar.gz
  • Upload date:
  • Size: 21.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-2.1.0.tar.gz
Algorithm Hash digest
SHA256 545c537643bfc63641cdc3e717f02e7676c6d58143fea9a39425fb88434ca067
MD5 01e0ccb6e53c06831e0192e27aaf7071
BLAKE2b-256 ec4a5bf03efc4553f71fada55cfd55ca5607007112d1cc755413459257540a19

See more details on using hashes here.

File details

Details for the file ai_critic-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: ai_critic-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 789801e137861986ad6b3535063ae780065a47e7121074bc98f67e4dd9a740ab
MD5 f898ebfe129c63b299cbbc32852ce5a1
BLAKE2b-256 06b07d1116a23829ea69681c80617510bfe5a89c45ad72053d6441f7658b7c91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page