Marginal Baseline Eval (MBE): A framework for rigorously auditing representation metrics in deep neural networks.

These details have not been verified by PyPI

Project links

Homepage

Project description

The Marginal Baseline Eval (MBE)

Welcome to the Marginal Baseline Eval (MBE) repository!

This repository provides the formal implementation of the MBE protocol — a strict, 4-stage validation methodology designed to rigorously audit representation metrics in deep neural networks.

It was originally built during a massive case study that mathematically falsified the Gradient Effective Rank (FIM_norm) metric.

Why Do We Need MBE?

The AI safety and interpretability communities frequently propose internal structural metrics (e.g., representation geometry, effective rank, gradient coherence) to predict generalization or track model health.

However, many of these metrics are secretly Loss Proxies. Because early validation loss trivially predicts final test accuracy, any metric that mathematically correlates with the magnitude of the loss will automatically correlate with generalization. Such a metric provides zero independent structural insight.

The MBE protocol catches these false positive metrics using a rigorous partial-correlation baseline control.

Installation

You can install the framework directly from PyPI:

pip install mbe-eval

Or, if you want to run the PyTorch demos, clone the repository:

git clone https://github.com/AparajeetS/metric-audit-paper-code.git
cd metric-audit-paper-code
pip install -r requirements.txt

The MBE API

MBE is a fully importable Python framework powered by pandas and pingouin. You can integrate it directly into your own model evaluation pipelines.

from mbe_eval import MBEEvaluator

# Pass your experimental arrays (numpy arrays)
evaluator = MBEEvaluator(metric_name="My Cool Metric", baseline_name="Epoch 20 Val Loss")
report = evaluator.evaluate(metric_vals, baseline_vals, target_vals)

This automatically prints a beautiful rich diagnostics table to the console and generates a high-resolution seaborn graphical report in the mbe_reports/ directory.

Real PyTorch Demos

We provide two end-to-end PyTorch scripts in the examples/ directory that actually train neural networks and run the evaluation live.

1. The Acid Test (Stage 1) Shows how a metric can successfully track capacity and noise, giving false assurance.

python examples/01_run_acid_test.py

2. The Heterogeneous Grid (Stage 4) The killer demo. Trains 20 models with randomized hyperparameters, computes the Gradient Effective Rank, and runs the final MBE Partial Correlation control to prove the metric is a disguised loss proxy.

python examples/02_run_heterogeneous_grid.py

Repository Structure

metric-audit-paper-code/
├── mbe_eval/               # The core MBE evaluation API
│   ├── __init__.py
│   ├── core.py             # MBEEvaluator class
│   ├── utils.py            # PyTorch FIM_norm extraction
│   └── sample_eval.py      # Basic synthetic simulation
├── examples/               # Real end-to-end PyTorch demos
│   ├── 01_run_acid_test.py
│   └── 02_run_heterogeneous_grid.py
├── experiments/            # All 12 original paper experiment scripts
├── metric_audit/           # Core FIM_norm computation library
├── docs/
│   └── RESULTS.md          # Raw numerical results for the paper
├── PAPER.md                # Full technical writeup
├── requirements.txt
├── LICENSE
└── README.md

Citation

If you use the Marginal Baseline Eval in your own representation evaluation, please cite the accompanying manuscript:

Shadangi, A. (2026). Does It Beat the Baseline? A Comprehensive Negative Result 
on Gradient Effective Rank as a Generalization Predictor. arXiv preprint.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Jun 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbe_eval-0.1.0.tar.gz (10.5 kB view details)

Uploaded Jun 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mbe_eval-0.1.0-py3-none-any.whl (10.7 kB view details)

Uploaded Jun 23, 2026 Python 3

File details

Details for the file mbe_eval-0.1.0.tar.gz.

File metadata

Download URL: mbe_eval-0.1.0.tar.gz
Upload date: Jun 23, 2026
Size: 10.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mbe_eval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`996131c5bc493dda4b33c34330c2e55c369ca56083c134e524d9b1efa158ea11`
MD5	`d3e2203f8af7308ce186ed5dd6154ce4`
BLAKE2b-256	`0b502b8349d3eb6b179233435d9359b439dadb204a08b20504f198e4330e0c7b`

See more details on using hashes here.

File details

Details for the file mbe_eval-0.1.0-py3-none-any.whl.

File metadata

Download URL: mbe_eval-0.1.0-py3-none-any.whl
Upload date: Jun 23, 2026
Size: 10.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for mbe_eval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80d914984ddcd312e36fb4d11995ab0dd295f4b54728368fc928a2b0f242b23e`
MD5	`69557b6e6625bf8ee841da33b8f004b1`
BLAKE2b-256	`976ffbc9e98209fae9f2bd9fffe956f91e0cdf749ea91e5b3653285d893f266f`

See more details on using hashes here.

mbe-eval 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Marginal Baseline Eval (MBE)

Why Do We Need MBE?

Installation

The MBE API

Real PyTorch Demos

Repository Structure

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes