Skip to main content

Graph-based evaluation engine for machine learning models

Project description

🚀 AI Critic 3.4.5 (Unified Evaluation Engine)

pip install ai-critic

AI Critic is a graph-based evaluation engine for machine learning models, designed to go beyond isolated metrics.

It runs a structured evaluation pipeline that analyzes multiple dimensions — performance, robustness, explainability, data quality, and structure — delivering a unified, interpretable, and actionable report.


🔥 WHAT’S NEW IN 3.4.5

🧠 Fully Unified Architecture

  • Single entry point: evaluate()
  • Single output format: report
  • Removal of fragmented and inconsistent outputs

📦 Standardized Report (JSON-first)

All results follow the same schema:

report = {
    "scores": {},        # technical scores (0–1)
    "details": {},       # raw evaluator outputs
    "risk": {},          # interpretable score (0–100)
    "summary": {},       # human-readable insights
    "suggestions": []    # recommended actions
}

👉 This makes AI Critic:

  • API-ready
  • Easy to log and persist
  • Production-ready

⚡ Improved Graph Engine

  • Dependency-aware execution (topological sort)
  • Parallel execution support
  • Deterministic evaluation order

🎯 Multi-layer Scoring System

  • Technical score (0–1) → aggregation layer
  • Risk score (0–100) → decision layer

💡 Integrated Suggestion Engine

  • Automatically generates recommendations based on model behavior

🧩 Plugin System Stabilization

  • Cleaner evaluator interface
  • Improved dependency resolution
  • Easier extension of the evaluation pipeline

⚡ QUICK START

from ai_critic import AICritic
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris

# Data
data = load_iris()
X, y = data.data, data.target

# Model
model = RandomForestClassifier().fit(X, y)

# Critic
critic = AICritic(weights={
    "performance": 1.0,
    "robustness": 1.5
})

# Evaluation
report = critic.evaluate(model, X, y, parallel=True)

# 🔹 Technical scores
print(report["scores"])

# 🔹 Risk score (0–100)
print(report["risk"])

# 🔹 Human summary
print(report["summary"])

# 🔹 Suggestions
for s in report["suggestions"]:
    print("-", s)

🧩 INTERNAL PIPELINE

evaluate()
   ↓
EvaluationGraph (nodes)
   ↓
raw_results
   ↓
ScoreAggregator (0–1)
   ↓
build_report()
   ↓
scoring.py (risk 0–100)
   ↓
summary.py (human-readable)
   ↓
SuggestionEngine

🧱 CORE COMPONENTS

1. Evaluation Graph

A DAG-based execution system:

  • Automatically resolves dependencies
  • Executes nodes in correct order
  • Enables parallel execution

Example:

performance → robustness → explainability

2. Score Aggregator

Combines evaluator outputs:

critic = AICritic(weights={
    "performance": 1.0,
    "robustness": 2.0
})

3. Evaluator Plugins

Fully extensible via plugins:

from ai_critic.plugins.base import EvaluatorPlugin
from ai_critic.plugins.registry import EvaluatorRegistry

class FairnessEvaluator(EvaluatorPlugin):
    name = "fairness"
    dependencies = ["performance"]
    weight = 1.0

    def evaluate(self, model, dataset, context=None):
        return {
            "score": 0.92,
            "verdict": "stable",
            "message": "Fairness is acceptable"
        }

EvaluatorRegistry.register(FairnessEvaluator())

4. Risk Scoring (0–100)

Transforms technical signals into decision-ready output:

report["risk"] = {
    "global_score": 78.5,
    "verdict": "usable_with_caution",
    "component_scores": {...},
    "penalties": [...]
}

5. Human Summary

High-level interpretation:

report["summary"] = {
    "executive_summary": {
        "verdict": "⚠️ Risky",
        "deploy_recommended": False
    }
}

6. Suggestion Engine

Actionable insights:

[
    "Check for data leakage",
    "Improve robustness with regularization"
]

🖥️ CLI

ai-critic --model model.pkl --data dataset.csv --target label

Output includes:

  • scores
  • risk analysis
  • summary

🧠 DESIGN PHILOSOPHY

1. Single Source of Truth

One unified data format → no inconsistencies


2. Graph-first Thinking

Evaluation is a dependency-driven pipeline, not isolated functions


3. JSON-native

Everything is ready for:

  • APIs
  • dashboards
  • logging
  • SaaS platforms

4. Actionable AI

Not just metrics — decisions:

  • Should you deploy?
  • Where is the risk?
  • What should be improved?

🔥 POSITIONING

AI Critic is not just a metrics library.

It is a:

🧠 Linting engine for machine learning models


🚀 ROADMAP

  • REST API (/evaluate)
  • Visual dashboard
  • Model telemetry
  • Continuous learning (feedback loop)
  • Global benchmarking between models

📄 LICENSE

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_critic-3.4.6.tar.gz (21.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_critic-3.4.6-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file ai_critic-3.4.6.tar.gz.

File metadata

  • Download URL: ai_critic-3.4.6.tar.gz
  • Upload date:
  • Size: 21.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-3.4.6.tar.gz
Algorithm Hash digest
SHA256 05c11b504c866fd1a52eedbf4ffcb6a724734336f0877bd5416708536422be64
MD5 09fc868d6a12aa4948eaa2bdb4cc2b56
BLAKE2b-256 f40b25294f7068e660d88b0ed1c7ea4972c813270013782e951f9b889edea7f6

See more details on using hashes here.

File details

Details for the file ai_critic-3.4.6-py3-none-any.whl.

File metadata

  • Download URL: ai_critic-3.4.6-py3-none-any.whl
  • Upload date:
  • Size: 27.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for ai_critic-3.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 498d2e743cf26e1063c6347fc8f41fc973beafd9105cc15e1d1fc9ecf4ae3c44
MD5 6c43f01a87893628485066b2b8e605c4
BLAKE2b-256 28819fb88fad8325cbf95507eb62f6db22e5be672e1519ded362c2e8ac40b946

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page