Stateless CIL/CL evaluation metrics library with SNN energy-aware extensions
Project description
cl-metrics 📐
The missing
scikit-learn.metricsfor Continual Learning.
Feed it a matrix. Get your metrics. No framework. No boilerplate. No pain.
Documentation
- 📖 Interactive FAQ — 40 questions answered, searchable by category
The Problem Every CIL Researcher Knows
You trained your model. You have your accuracy matrix. Now you need AA, BWT, FWT, Intransigence.
So you open Avalanche. Or PyCIL. Or FACIL.
And you find out that every framework buries its metrics inside its own training loop. You can't just pass a numpy array. You have to simulate their data stream, wrap your model in their classes, and fight their abstractions — just to compute a mean and a difference.
So you write your own NumPy script. Again. Like everyone else.
cl-metrics ends this.
Install
pip install cl-metrics
30-Second Quick Start
import numpy as np
from cl_metrics import CLMetrics
# Your N x N accuracy matrix
# R[i, j] = accuracy on task j after training on task i
R = np.array([
[0.90, 0.00, 0.00],
[0.72, 0.85, 0.00],
[0.65, 0.78, 0.88],
])
m = CLMetrics(R)
m.summary()
=== cl-metrics Summary ===
AA : 0.7700
BWT : -0.1350
FWT : 0.0000
Plasticity : 0.8767
Stability : 0.8217
Forgetting : 0.1350
==========================
That's it. No imports beyond numpy. No framework. No training loop.
What You Get
Standard CIL Metrics
All implemented to their canonical formulations — no improvisation, no drift.
| Metric | What it measures | Canonical Reference |
|---|---|---|
| AA | Mean final accuracy across all tasks | Lopez-Paz & Ranzato (2017) |
| BWT | How much new learning hurts old tasks (forgetting) | Díaz-Rodríguez et al. (2018) |
| FWT | Zero-shot performance on future tasks | Díaz-Rodríguez et al. (2018) |
| Intransigence | Resistance to learning new tasks vs. oracle | Díaz-Rodríguez et al. (2018) |
| Plasticity Index | How well the model learns each new task | Serra et al. (2018) |
| Stability Index | How much past knowledge is retained | Serra et al. (2018) |
| Forgetting Measure | Maximum accuracy drop per task | Chaudhry et al. (2018) |
SNN Energy-Aware Metrics ⚡ (First standardised suite)
If you work with Spiking Neural Networks, accuracy alone is not enough. A model that achieves 90% accuracy at 40% spike rate is not better than one at 88% accuracy at 5% spike rate. Until now, there was no standard way to measure this.
from cl_metrics import SNNMetrics
spike_rates = np.array([0.12, 0.09, 0.11]) # mean firing rate per task
snn = SNNMetrics(R, spike_rates=spike_rates)
snn.summary()
=== cl-metrics Summary ===
AA : 0.7700
BWT : -0.1350
...
==========================
=== SNN Energy Metrics ===
SRP : 0.1067 ← Spike Rate Proxy (energy proxy)
SR-AA : 0.6878 ← Accuracy penalised by energy cost
EA-BWT : -0.1208 ← Energy-weighted forgetting
EER : 2.1563 ← Error-to-Energy Ratio (lower = better)
==========================
| Metric | Formula | What it captures |
|---|---|---|
| SRP | mean(spike_rates) | Dynamic energy proxy |
| SR-AA | AA × (1 − SRP) | Accuracy adjusted for energy cost |
| EA-BWT | Energy-weighted BWT per task | High-energy forgetting penalised more |
| EER | (1 − AA) / SRP | Combined error + energy in one scalar |
Full API Reference
CLMetrics(matrix, task_weights=None)
from cl_metrics import CLMetrics
m = CLMetrics(R) # macro-average (equal task weights)
m = CLMetrics(R, task_weights=[10,20,10]) # micro-average by class count
m.average_accuracy() # → float
m.backward_transfer() # → float (negative = forgetting)
m.forward_transfer() # → float
m.intransigence(ref) # → float (pass oracle accuracies per task)
m.plasticity_index() # → float
m.stability_index() # → float
m.forgetting_measure() # → float
m.summary() # → dict (prints + returns all metrics)
SNNMetrics(matrix, spike_rates=None, task_weights=None)
from cl_metrics import SNNMetrics
snn = SNNMetrics(R, spike_rates=np.array([0.12, 0.09, 0.11]))
snn.spike_rate_proxy() # → float
snn.spike_rate_normalized_aa() # → float
snn.energy_adjusted_bwt() # → float
snn.energy_to_error_ratio() # → float
snn.summary() # → dict (all CL + SNN metrics)
Input Format
R[i, j] = accuracy on task j, evaluated after training on task i
Task 0 Task 1 Task 2
After 0 [ 0.90 0.00 0.00 ] ← only trained on task 0
After 1 [ 0.72 0.85 0.00 ] ← trained on tasks 0-1
After 2 [ 0.65 0.78 0.88 ] ← trained on all tasks
- Values must be in [0, 1] (not percentages)
- Shape must be (N, N)
- Lower triangle = retention | Upper triangle = zero-shot transfer
Why Not Just Use Avalanche / PyCIL?
| Avalanche / PyCIL | cl-metrics | |
|---|---|---|
| Input | Requires live data stream + model | Raw numpy array |
| Framework dependency | PyTorch required | numpy only |
| Works with JAX / TF / C++ | ❌ | ✅ |
| Works with neuromorphic chips | ❌ | ✅ |
| SNN energy metrics | ❌ | ✅ |
| Lines of code to get BWT | ~50 (wrapper code) | 3 |
| Install size | Heavy | ~2MB |
Intransigence: Pass Your Oracle
# oracle_accs[j] = accuracy of a model trained *only* on task j
oracle_accs = np.array([0.92, 0.89, 0.91])
m.intransigence(reference_accuracies=oracle_accs)
If you don't have oracle accuracies, intransigence returns 0.0 by default (mathematically correct — the model is its own reference).
Validated Against
cl-metrics metrics are validated against the Maya Research Series (P3–P7), a 7-paper neuromorphic SNN continual learning benchmark on Split-CIFAR-10 and Split-CIFAR-100.
| Paper | Benchmark | Reported AA | Reported BWT |
|---|---|---|---|
| Maya-CL (P3) | Split-CIFAR-10 TIL | 62.38% | — |
| Maya-Smriti (P4) | Split-CIFAR-10 CIL | 31.84% | — |
| Maya-Viveka (P5) | Split-CIFAR-100 CIL | 16.03% | −50.50% |
| Maya-Chitta (P6) | Split-CIFAR-100 CIL | 14.42% | — |
| Maya-Manas (P7) | Split-CIFAR-100 CIL | 15.19% | −50.91% |
The Reproducibility Problem This Fixes
The CIL community has a well-documented metric inconsistency crisis:
- AA is computed with both macro-averaging (by task) and micro-averaging (by class count) — these give different numbers
- BWT formulations differ in how they handle early stopping and buffer sizes
- FWT has two completely different definitions in common use (zero-shot vs. curriculum acceleration)
- Intransigence is routinely approximated without the oracle, breaking comparability
cl-metrics implements each metric to its original published formulation, documented and unit-tested. When you report metrics computed with cl-metrics, reviewers can verify your numbers independently.
Contributing
Found a metric formulation that differs from what's implemented? Open an issue with the paper reference. Correctness over convenience — always.
git clone https://github.com/venky2099/cl-metrics
cd cl-metrics
pip install -e ".[dev]"
pytest tests/ -v
Citation
If cl-metrics saved you from writing another NumPy script, please cite:
@software{swaminathan2026clmetrics,
author = {Swaminathan, Venkatesh},
title = {cl-metrics: Stateless Continual Learning Evaluation Metrics
with SNN Energy-Aware Extensions},
year = {2026},
publisher = {Zenodo},
doi = {10.5281/zenodo.19388144},
url = {https://doi.org/10.5281/zenodo.19388144},
orcid = {0000-0002-3315-7907}
}
Author
Venkatesh Swaminathan
Founder, Nexus Learning Labs · Bengaluru, India
M.Sc. Data Science & AI, BITS Pilani
ORCID: 0000-0002-3315-7907
GitHub: @venky2099
Built because the community deserved a tool that just works.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cl_metrics_nll-1.0.0.tar.gz.
File metadata
- Download URL: cl_metrics_nll-1.0.0.tar.gz
- Upload date:
- Size: 15.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0b540a9aa3a9a005c52fdee03f394cf018b9fb4a3943434435eaf9f20aa9c69
|
|
| MD5 |
3fa172aeb3032594984964350d33440b
|
|
| BLAKE2b-256 |
7df6d4304e0df18a996a52231ea11ae4fb34cfa02703a12dbbed20225f98645a
|
File details
Details for the file cl_metrics_nll-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cl_metrics_nll-1.0.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0dcf9df474ebe094c43ad986f83707374fb489daf901c0e8a6a503f176e0e3d2
|
|
| MD5 |
8fe7525cb324a79e21eca94f8d346c70
|
|
| BLAKE2b-256 |
35a22a4783d4bab34bd3d8197fe50736273cf9638d926a4c83144093d488d9b9
|