Skip to main content

Graph-based evaluation engine for machine learning models

Project description

ai-critic 3.2.0

pip install ai-critic

Latest version
Released: 2026

AI Critic — Evaluation Graph Engine for Machine Learning.


AI Critic

AI Critic is a graph-based evaluation engine for machine learning models.

Instead of isolated metrics, AI Critic runs a structured evaluation pipeline that analyzes multiple dimensions of model quality such as:

  • performance
  • robustness
  • explainability
  • dataset risks
  • overfitting signals

The system produces structured diagnostics and a unified score.

AI Critic is designed to be:

  • deterministic
  • extensible
  • plugin-driven
  • CI friendly

No telemetry.
No AutoML.
Just structured model evaluation.


Why AI Critic?

Typical ML evaluation focuses only on metrics like accuracy.

But real-world models fail for many reasons:

  • data leakage
  • class imbalance
  • unstable features
  • fragile robustness

AI Critic evaluates these risks in a structured pipeline.

Key Features

Model Audit

Run a full diagnostic of your model and dataset.

from ai_critic.audit import audit

report = audit(model, X, y)

Audit detects:

  • dataset size issues
  • class imbalance
  • potential overfitting
  • suspiciously perfect validation scores

Example output:

{
  "dataset_checks": {...},
  "model_checks": {...},
  "scores": {...}
}

Benchmark Multiple Models

Compare several models on the same dataset.

from ai_critic.benchmark import benchmark

results = benchmark(
    models=[model1, model2, model3],
    X=X,
    y=y
)

Output:

RandomForestClassifier   score: 0.91
SVC                      score: 0.87
LogisticRegression       score: 0.82

Evaluation Graph Engine

AI Critic executes evaluations using a structured Evaluation Graph.

Each evaluator is a node:

  • PerformanceEvaluator
  • RobustnessEvaluator
  • ExplainabilityEvaluator

Nodes:

  • run independently
  • produce structured output
  • return normalized scores
  • can declare dependencies

The graph aggregates them into a final score.


Graph Visualization

Visualize the evaluation pipeline.

critic.visualize()

Generates a graph representation of the evaluation pipeline.

Example structure:

performance
   ↓
robustness
   ↓
explainability

Cross-Validation Intelligence

Automatically selects validation strategy:

  • StratifiedKFold for classification
  • KFold for regression

Reports:

  • CV mean score
  • standard deviation
  • suspiciously perfect scores

Robustness Testing

Tests model stability under controlled noise injection.

Reports:

  • performance degradation
  • stability classification
  • robustness score

Explainability Signal

Uses permutation sensitivity analysis to estimate feature importance behavior.

Detects:

  • shortcut learning
  • unstable features
  • potential leakage signals

Quick Start

from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

data = load_iris()
X, y = data.data, data.target

model = RandomForestClassifier()
model.fit(X, y)

critic = AICritic()

report = critic.evaluate(model, X, y)

print(report["scores"])

Output:

{
  "overall": 0.87,
  "verdict": "good"
}

Output Structure

{
  "scores": {
      "overall": 0.83,
      "verdict": "good"
  },
  "details": {
      "performance": {...},
      "robustness": {...},
      "explainability": {...}
  }
}

Each evaluator returns diagnostic metadata.


Plugin System

AI Critic 3.0 introduces a plugin architecture.

You can create custom evaluators.

Example:

from ai_critic.plugins.base import EvaluatorPlugin

class FairnessEvaluator(EvaluatorPlugin):

    name = "fairness"

    def evaluate(self, model, dataset, context):
        return {
            "score": 0.9,
            "message": "Model fairness acceptable"
        }

Register the plugin:

from ai_critic.plugins.registry import EvaluatorRegistry

EvaluatorRegistry.register(FairnessEvaluator())

AI Critic will automatically include it in the evaluation pipeline.


CLI Usage

ai-critic --model model.pkl --data dataset.csv --target label

Output:

=== AI CRITIC REPORT ===

Overall score: 0.812
Verdict: good

JSON mode:

ai-critic --json

Installation

pip install ai-critic

Dependencies:

  • numpy
  • scikit-learn
  • graphviz (optional for visualization)

Design Philosophy

AI Critic follows three principles:

  1. Deterministic evaluation
  2. Modular architecture
  3. No hidden ML layers

AI Critic is not:

  • an AutoML tool
  • a model trainer
  • a black box evaluator

It is an evaluation engine.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_critic-3.2.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_critic-3.2.0-py3-none-any.whl (26.1 kB view details)

Uploaded Python 3

File details

Details for the file ai_critic-3.2.0.tar.gz.

File metadata

  • Download URL: ai_critic-3.2.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-3.2.0.tar.gz
Algorithm Hash digest
SHA256 c0519924407d334e2cee9d0ce1ccf2e219c309b59819b0b4bffaccc9ac25c2e3
MD5 60b09377091fd4009cf6682e51aee8e8
BLAKE2b-256 a1d6e2360b89c18cb2d8fff331cdb728c28bc6a76b6e07c8c3056b31aa2b76b8

See more details on using hashes here.

File details

Details for the file ai_critic-3.2.0-py3-none-any.whl.

File metadata

  • Download URL: ai_critic-3.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-3.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bb08525536c5bff536e5b9a8d70280ca9570ef34a1c7899ed5fa3ee8a3735ee4
MD5 ed0df486931f61209874ab5259dca7f0
BLAKE2b-256 91fe0309c15ab145a259c723caf31fcb00f12ef38b30487604c4b6bd4f91900a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page