Skip to main content

A model diagnostics library for data scientists — performance, calibration, drift, and reporting in a few lines of Python.

Project description

diagnost

A model diagnostics library for data scientists.
Performance, calibration, drift detection, and dataset health checks, all in a few lines of Python.

PyPI version Python License: MIT Tests

Why diagnost?

Most ML libraries help you build models. diagnost helps you trust them.

After training, the real questions start:

  • Is my model actually reliable, or just accurate on average?
  • Does it perform equally across different groups?
  • Are its confidence scores meaningful?
  • Has my data drifted since I trained it?

diagnost answers all of these, cleanly, quickly, and in plain English.

Installation

pip install diagnost

Quickstart

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
import diagnost

X, y = load_iris(return_X_y=True, as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

model = RandomForestClassifier().fit(X_train, y_train)

report = diagnost.evaluate(model, X_test, y_test, task="classification")
report.summary()

Features

1. Model Evaluation

Evaluate classification, regression, and clustering models with one call.

# Classification
report = diagnost.evaluate(model, X_test, y_test, task="classification")

# Regression
report = diagnost.evaluate(model, X_test, y_test, task="regression")

# Clustering
report = diagnost.evaluate(model, X_test, task="clustering")

Subgroup / fairness analysis — check performance across sensitive groups:

report = diagnost.evaluate(
    model, X_test, y_test,
    task="classification",
    sensitive_features=["gender", "age_group"]
)
report.summary()

2. Model Comparison

Compare multiple models side by side with a winner declared automatically.

from diagnost.compare import compare

report = compare(
    models={"Random Forest": rf, "Logistic Regression": lr},
    X=X_test,
    y=y_test,
    task="classification"
)

df = report.to_dataframe()  # returns a pandas DataFrame

3. Calibration Analysis

Check whether your model's predicted probabilities are actually reliable.

from diagnost.calibration import check_calibration

check_calibration(model, X_test, y_test)

Output includes:

  • Expected Calibration Error (ECE) per class
  • Plain-English verdict ("Well calibrated", "Poorly calibrated")
  • Reliability diagram

4. Drift Detection

Detect whether your input data has shifted since training.

from diagnost.drift import check_drift

check_drift(X_train, X_new)
  • Kolmogorov-Smirnov test for numeric features
  • Chi-Square test for categorical features
  • Per-feature drift verdict with p-values
  • Distribution plots for drifted features

5. Dataset Diagnostics

Inspect your dataset before modelling.

results = diagnost.inspect_dataset(df)

Checks for:

  • Missing values
  • Highly correlated features (r > 0.85)
  • Outliers (IQR method)
  • Feature distributions (visual)

Saving Reports

report = diagnost.evaluate(model, X_test, y_test, task="classification")
report.save("report.json")  # exports as JSON

Supported Model Types

Task Supported Frameworks
Classification scikit-learn, XGBoost, LightGBM, CatBoost
Regression scikit-learn, XGBoost, LightGBM, CatBoost
Clustering scikit-learn

Any model with a .predict() method will work.

Requirements

  • Python >= 3.9
  • numpy, pandas, scipy, matplotlib, scikit-learn

Contributing

Contributions are welcome. To get started:

git clone https://github.com/Eklavya20/diagnost.git
cd diagnost
python -m venv venv
venv\Scripts\activate      # Windows
pip install -e ".[dev]"
pytest tests/ -v

Please open an issue before submitting a large pull request.

License

MIT License — free to use, modify, and distribute.
See LICENSE for details.

Author

Eklavya Jumnani
MSc Data Science, FAU Erlangen-Nürnberg
GitHub · LinkedIn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diagnost-0.1.1.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diagnost-0.1.1-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file diagnost-0.1.1.tar.gz.

File metadata

  • Download URL: diagnost-0.1.1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for diagnost-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66f4328fea0dd2c2b180acb8fcb14462833157b2ee15c463c7292b728686483a
MD5 1543dd5ebd4cc29486cf195de1503e07
BLAKE2b-256 2ea225b256510317e3cc46a31d01f8b6ec1975b4c158bed2303742d79985fb40

See more details on using hashes here.

File details

Details for the file diagnost-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: diagnost-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for diagnost-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 689caa797b0a545515a41b3afe73a6a378143eadbbbb5aa052ae1aed153a1a12
MD5 d7b1b34147db191489443849366e2d72
BLAKE2b-256 f3fcd7edd1b22cc57d9d227e9d0ff8d24c2e10434639979b456376012e35cc00

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page