Stateless CIL/CL evaluation metrics library with SNN energy-aware extensions

Project description

cl-metrics 📐

The missing scikit-learn.metrics for Continual Learning.
Feed it a matrix. Get your metrics. No framework. No boilerplate. No pain.

Documentation

📖 Interactive FAQ — 40 questions answered, searchable by category

The Problem Every CIL Researcher Knows

You trained your model. You have your accuracy matrix. Now you need AA, BWT, FWT, Intransigence.

So you open Avalanche. Or PyCIL. Or FACIL.

And you find out that every framework buries its metrics inside its own training loop. You can't just pass a numpy array. You have to simulate their data stream, wrap your model in their classes, and fight their abstractions — just to compute a mean and a difference.

So you write your own NumPy script. Again. Like everyone else.

cl-metrics ends this.

Install

pip install cl-metrics

30-Second Quick Start

import numpy as np
from cl_metrics import CLMetrics

# Your N x N accuracy matrix
# R[i, j] = accuracy on task j after training on task i
R = np.array([
    [0.90, 0.00, 0.00],
    [0.72, 0.85, 0.00],
    [0.65, 0.78, 0.88],
])

m = CLMetrics(R)
m.summary()

=== cl-metrics Summary ===
  AA          : 0.7700
  BWT         : -0.1350
  FWT         : 0.0000
  Plasticity  : 0.8767
  Stability   : 0.8217
  Forgetting  : 0.1350
==========================

That's it. No imports beyond numpy. No framework. No training loop.

What You Get

Standard CIL Metrics

All implemented to their canonical formulations — no improvisation, no drift.

Metric	What it measures	Canonical Reference
AA	Mean final accuracy across all tasks	Lopez-Paz & Ranzato (2017)
BWT	How much new learning hurts old tasks (forgetting)	Díaz-Rodríguez et al. (2018)
FWT	Zero-shot performance on future tasks	Díaz-Rodríguez et al. (2018)
Intransigence	Resistance to learning new tasks vs. oracle	Díaz-Rodríguez et al. (2018)
Plasticity Index	How well the model learns each new task	Serra et al. (2018)
Stability Index	How much past knowledge is retained	Serra et al. (2018)
Forgetting Measure	Maximum accuracy drop per task	Chaudhry et al. (2018)

SNN Energy-Aware Metrics ⚡ (First standardised suite)

If you work with Spiking Neural Networks, accuracy alone is not enough. A model that achieves 90% accuracy at 40% spike rate is not better than one at 88% accuracy at 5% spike rate. Until now, there was no standard way to measure this.

from cl_metrics import SNNMetrics

spike_rates = np.array([0.12, 0.09, 0.11])  # mean firing rate per task
snn = SNNMetrics(R, spike_rates=spike_rates)
snn.summary()

=== cl-metrics Summary ===
  AA          : 0.7700
  BWT         : -0.1350
  ...
==========================
=== SNN Energy Metrics ===
  SRP         : 0.1067   ← Spike Rate Proxy (energy proxy)
  SR-AA       : 0.6878   ← Accuracy penalised by energy cost
  EA-BWT      : -0.1208  ← Energy-weighted forgetting
  EER         : 2.1563   ← Error-to-Energy Ratio (lower = better)
==========================

Metric	Formula	What it captures
SRP	mean(spike_rates)	Dynamic energy proxy
SR-AA	AA × (1 − SRP)	Accuracy adjusted for energy cost
EA-BWT	Energy-weighted BWT per task	High-energy forgetting penalised more
EER	(1 − AA) / SRP	Combined error + energy in one scalar

Full API Reference

`CLMetrics(matrix, task_weights=None)`

from cl_metrics import CLMetrics

m = CLMetrics(R)                          # macro-average (equal task weights)
m = CLMetrics(R, task_weights=[10,20,10]) # micro-average by class count

m.average_accuracy()     # → float
m.backward_transfer()    # → float (negative = forgetting)
m.forward_transfer()     # → float
m.intransigence(ref)     # → float (pass oracle accuracies per task)
m.plasticity_index()     # → float
m.stability_index()      # → float
m.forgetting_measure()   # → float
m.summary()              # → dict (prints + returns all metrics)

`SNNMetrics(matrix, spike_rates=None, task_weights=None)`

from cl_metrics import SNNMetrics

snn = SNNMetrics(R, spike_rates=np.array([0.12, 0.09, 0.11]))

snn.spike_rate_proxy()           # → float
snn.spike_rate_normalized_aa()   # → float
snn.energy_adjusted_bwt()        # → float
snn.energy_to_error_ratio()      # → float
snn.summary()                    # → dict (all CL + SNN metrics)

Input Format

R[i, j] = accuracy on task j, evaluated after training on task i

         Task 0   Task 1   Task 2
After 0 [ 0.90    0.00     0.00  ]   ← only trained on task 0
After 1 [ 0.72    0.85     0.00  ]   ← trained on tasks 0-1
After 2 [ 0.65    0.78     0.88  ]   ← trained on all tasks

- Values must be in [0, 1]  (not percentages)
- Shape must be (N, N)
- Lower triangle = retention | Upper triangle = zero-shot transfer

Why Not Just Use Avalanche / PyCIL?

	Avalanche / PyCIL	cl-metrics
Input	Requires live data stream + model	Raw numpy array
Framework dependency	PyTorch required	numpy only
Works with JAX / TF / C++	❌	✅
Works with neuromorphic chips	❌	✅
SNN energy metrics	❌	✅
Lines of code to get BWT	~50 (wrapper code)	3
Install size	Heavy	~2MB

Intransigence: Pass Your Oracle

# oracle_accs[j] = accuracy of a model trained *only* on task j
oracle_accs = np.array([0.92, 0.89, 0.91])
m.intransigence(reference_accuracies=oracle_accs)

If you don't have oracle accuracies, intransigence returns 0.0 by default (mathematically correct — the model is its own reference).

Validated Against

cl-metrics metrics are validated against the Maya Research Series (P3–P7), a 7-paper neuromorphic SNN continual learning benchmark on Split-CIFAR-10 and Split-CIFAR-100.

Paper	Benchmark	Reported AA	Reported BWT
Maya-CL (P3)	Split-CIFAR-10 TIL	62.38%	—
Maya-Smriti (P4)	Split-CIFAR-10 CIL	31.84%	—
Maya-Viveka (P5)	Split-CIFAR-100 CIL	16.03%	−50.50%
Maya-Chitta (P6)	Split-CIFAR-100 CIL	14.42%	—
Maya-Manas (P7)	Split-CIFAR-100 CIL	15.19%	−50.91%

DOIs: P3 · P4 · P5 · P6 · P7

The Reproducibility Problem This Fixes

The CIL community has a well-documented metric inconsistency crisis:

AA is computed with both macro-averaging (by task) and micro-averaging (by class count) — these give different numbers
BWT formulations differ in how they handle early stopping and buffer sizes
FWT has two completely different definitions in common use (zero-shot vs. curriculum acceleration)
Intransigence is routinely approximated without the oracle, breaking comparability

cl-metrics implements each metric to its original published formulation, documented and unit-tested. When you report metrics computed with cl-metrics, reviewers can verify your numbers independently.

Contributing

Found a metric formulation that differs from what's implemented? Open an issue with the paper reference. Correctness over convenience — always.

git clone https://github.com/venky2099/cl-metrics
cd cl-metrics
pip install -e ".[dev]"
pytest tests/ -v

Citation

If cl-metrics saved you from writing another NumPy script, please cite:

@software{swaminathan2026clmetrics,
  author       = {Swaminathan, Venkatesh},
  title        = {cl-metrics: Stateless Continual Learning Evaluation Metrics
                  with SNN Energy-Aware Extensions},
  year         = {2026},
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.19388144},
  url          = {https://doi.org/10.5281/zenodo.19388144},
  orcid        = {0000-0002-3315-7907}
}

Author

Venkatesh Swaminathan
Founder, Nexus Learning Labs · Bengaluru, India
M.Sc. Data Science & AI, BITS Pilani
ORCID: 0000-0002-3315-7907
GitHub: @venky2099

Built because the community deserved a tool that just works.

Project details

Release history Release notifications | RSS feed

This version

1.0.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cl_metrics_nll-1.0.0.tar.gz (15.4 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cl_metrics_nll-1.0.0-py3-none-any.whl (12.6 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file cl_metrics_nll-1.0.0.tar.gz.

File metadata

Download URL: cl_metrics_nll-1.0.0.tar.gz
Upload date: May 10, 2026
Size: 15.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cl_metrics_nll-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c0b540a9aa3a9a005c52fdee03f394cf018b9fb4a3943434435eaf9f20aa9c69`
MD5	`3fa172aeb3032594984964350d33440b`
BLAKE2b-256	`7df6d4304e0df18a996a52231ea11ae4fb34cfa02703a12dbbed20225f98645a`

See more details on using hashes here.

File details

Details for the file cl_metrics_nll-1.0.0-py3-none-any.whl.

File metadata

Download URL: cl_metrics_nll-1.0.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 12.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cl_metrics_nll-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0dcf9df474ebe094c43ad986f83707374fb489daf901c0e8a6a503f176e0e3d2`
MD5	`8fe7525cb324a79e21eca94f8d346c70`
BLAKE2b-256	`35a22a4783d4bab34bd3d8197fe50736273cf9638d926a4c83144093d488d9b9`

See more details on using hashes here.

cl-metrics-nll 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

cl-metrics 📐

Documentation

The Problem Every CIL Researcher Knows

Install

30-Second Quick Start

What You Get

Standard CIL Metrics

SNN Energy-Aware Metrics ⚡ (First standardised suite)

Full API Reference

CLMetrics(matrix, task_weights=None)

SNNMetrics(matrix, spike_rates=None, task_weights=None)

Input Format

Why Not Just Use Avalanche / PyCIL?

Intransigence: Pass Your Oracle

Validated Against

The Reproducibility Problem This Fixes

Contributing

Citation

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`CLMetrics(matrix, task_weights=None)`

`SNNMetrics(matrix, spike_rates=None, task_weights=None)`