Lightweight model registry with experiment tracking and dynamic Excel export
Project description
⚡ LightML
Lightweight experiment tracking for LLM evaluation.
Three days into your experiment sprint: models scattered across five directories, evaluation results in a notebook you can't find, and that one promising checkpoint you forgot to save. Sound familiar? LightML is a zero-config experiment tracker that turns that mess into structured, searchable, exportable knowledge -- in four lines of Python.
pip install light-ml-registry
lightml init --path ./my_registry --name main
Table of Contents
- Why LightML
- Installation
- Quick Start (5 minutes)
- Core Concepts
- Python API Reference
- CLI Reference
- Dashboard (GUI)
- Excel Export
- Walkthrough: lm_eval pipeline
- Database Schema
- Project Structure
Why LightML
| Feature | LightML | MLflow | W&B |
|---|---|---|---|
| Setup | pip install light-ml-registry |
Server + DB | Cloud signup |
| Storage | Single SQLite file | Postgres/MySQL | Cloud |
| Dependencies | 4 packages | 20+ packages | API key required |
| Dashboard | Built-in (lightml gui) |
Separate server | Web app |
| Excel export | Built-in | No | No |
| Offline | ✅ | Partially | ❌ |
LightML is ideal when you need structured experiment tracking without the infrastructure.
Installation
From PyPI (recommended)
pip install light-ml-registry
From source (development)
git clone <repo-url> && cd LightML
pip install -e ".[dev]"
Dependencies (auto-installed):
pydantic— schema validationfastapi+uvicorn— dashboard serveropenpyxl— Excel export
For the lm_eval example you also need:
pip install lm-eval pyyaml
Quick Start (5 minutes)
1. Create a registry
lightml init --path ./my_registry --name main
This creates ./my_registry/main.db with all required tables.
2. Register a model and log metrics
from lightml.handle import LightMLHandle
# Connect to registry and create an experiment run
handle = LightMLHandle(db="./my_registry/main.db", run_name="gpt2-eval")
# Register the model
handle.register_model(
model_name="gpt2-eval",
path="openai-community/gpt2",
)
# Log metrics — family groups related metrics together
handle.log_model_metric(
model_name="gpt2-eval",
family="hellaswag_0shot",
metric_name="hellaswag_acc",
value=0.289,
)
handle.log_model_metric(
model_name="gpt2-eval",
family="hellaswag_0shot",
metric_name="hellaswag_acc_norm",
value=0.312,
)
3. View results
lightml gui --db ./my_registry/main.db --port 5050
Open http://localhost:5050 in your browser.
4. Export to Excel
lightml export --db ./my_registry/main.db --output report.xlsx
Generates one sheet per metric family with automatic color-scale formatting.
Core Concepts
LightML organizes data around four entities:
Run (experiment)
└── Model
├── Metrics (family / metric_name / value)
└── Checkpoint (step N)
└── Metrics
Run
An experiment context. Every model belongs to a run. Runs are created automatically when you instantiate LightMLHandle.
Model
A trained model registered under a run. Supports parent-child lineage to track fine-tuning chains (e.g., base → SFT → DPO).
Checkpoint
An intermediate training snapshot linked to a model. Identified by step number.
Metrics
Numeric values attached to either a model or a checkpoint. Organized by family (a logical group like "hellaswag_0shot") and metric_name (like "hellaswag_acc").
Python API Reference
LightMLHandle
The main entry point. All operations go through this handle.
from lightml.handle import LightMLHandle
handle = LightMLHandle(db="path/to/registry.db", run_name="my-experiment")
register_model(model_name, path, parent_name=None)
Register a model in the current run. Idempotent — calling twice with the same name is safe.
handle.register_model(
model_name="llama-sft",
path="/models/llama-3-8b-sft",
parent_name="llama-base", # optional: link to parent model
)
register_checkpoint(model_name, step, path)
Register a training checkpoint.
ckpt_id = handle.register_checkpoint(
model_name="llama-sft",
step=5000,
path="/checkpoints/llama-sft/step-5000",
)
log_model_metric(model_name, family, metric_name, value, force=False)
Log a metric on a model. Returns a status code.
from lightml.metrics import METRIC_INSERTED, METRIC_UPDATED, METRIC_SKIPPED
rc = handle.log_model_metric(
model_name="llama-sft",
family="mmlu_5shot",
metric_name="mmlu_acc",
value=0.634,
force=False, # True = overwrite if exists
)
if rc == METRIC_INSERTED: print("New metric logged")
if rc == METRIC_SKIPPED: print("Already existed, skipped")
if rc == METRIC_UPDATED: print("Overwritten (force=True)")
log_checkpoint_metric(checkpoint_id, family, metric_name, value, force=False)
Same as above, but attached to a checkpoint instead of a model.
handle.log_checkpoint_metric(
checkpoint_id=ckpt_id,
family="hellaswag_0shot",
metric_name="hellaswag_acc_norm",
value=0.412,
)
Bulk Metric Logging
Instead of calling log_model_metric() once per metric, use log_metrics() to log an entire evaluation result in one call:
# Nested dict: {family: {metric_name: value}}
counts = handle.log_metrics("llama-sft", {
"ENG 5-shot": {"MMLU": 56.2, "ARC": 48.7, "HellaSwag": 71.9},
"ITA 0-shot": {"MMLU": 52.8, "HellaSwag": 62.1},
})
print(counts) # {"inserted": 5, "updated": 0, "skipped": 0}
For a single family, use the flat variant:
counts = handle.log_metrics_flat("llama-sft", {
"MMLU": 56.2,
"ARC": 48.7,
}, family="ENG 5-shot")
Both methods support force=True to overwrite existing metrics, and return a summary dict with insert/update/skip counts.
Compare Models
Compare two models side-by-side to see per-metric deltas:
from lightml.compare import compare_models
result = compare_models(
db="./registry/main.db",
model_a="llama-base", # baseline
model_b="llama-sft", # candidate
run_name="my-experiment", # optional filter
family="ENG 5-shot", # optional filter
)
# Convenience properties
print(f"Improved: {len(result.improved)}")
print(f"Regressed: {len(result.regressed)}")
print(f"Unchanged: {len(result.unchanged)}")
print(f"Missing: {len(result.missing)}")
# Pretty terminal output (color-coded)
print(result.to_text())
# JSON-serializable dict (for APIs)
data = result.to_dict()
Each delta contains family, metric_name, value_a, value_b, delta (B−A), and pct_change.
Auto-import (Scan)
Bulk-import eval results from a directory tree without writing any Python:
from lightml.scan import scan_and_import
stats = scan_and_import(
db="./registry/main.db",
run_name="lm-eval-run",
path="./eval_results", # each subfolder = one model
format="lm_eval", # or "json"
model_prefix="eval/", # optional prefix
force=False, # True = overwrite duplicates
)
print(f"Models: {stats.models_registered}")
print(f"Metrics: {stats.metrics_logged}")
print(f"Skipped: {stats.skipped_dirs}")
Directory layout expected:
eval_results/
├── model-alpha/
│ └── results_2026-01-15T10-30-00.json # lm_eval format
├── model-beta/
│ └── results_2026-01-16T09-00-00.json
└── model-gamma/
│ └── metrics.json # generic JSON format
Supported formats:
| Format | File pattern | Structure |
|---|---|---|
lm_eval |
results_*.json |
{"results": {"task": {"metric": value}}} |
json |
metrics*.json / *.json |
{"metric": value} or {"family": {"metric": value}} |
Metric Deduplication
LightML prevents accidental duplicate metrics:
| Scenario | force=False (default) |
force=True |
|---|---|---|
| Metric does not exist | INSERT → METRIC_INSERTED |
INSERT → METRIC_INSERTED |
| Metric already exists | SKIP → METRIC_SKIPPED |
UPDATE → METRIC_UPDATED |
This means you can safely re-run evaluation scripts without polluting your database.
CLI Reference
lightml <command> [options]
init — Create a new registry
lightml init --path ./registry --name main [--overwrite]
model-register — Register a model
lightml model-register \
--db ./registry/main.db \
--run my-experiment \
--name llama-sft \
--path /models/llama-sft \
--parent llama-base # optional
checkpoint-register — Register a checkpoint
lightml checkpoint-register \
--db ./registry/main.db \
--run my-experiment \
--model llama-sft \
--step 5000 \
--path /checkpoints/step-5000
metric-log — Log a single metric
lightml metric-log \
--db ./registry/main.db \
--run my-experiment \
--model llama-sft \
--family mmlu_5shot \
--metric mmlu_acc \
--value 0.634 \
--force # optional: overwrite
export — Export Excel report
lightml export --db ./registry/main.db [--output report.xlsx]
scan — Auto-import eval results
Scan a directory tree and bulk-import models + metrics:
lightml scan \
--db ./registry/main.db \
--run lm-eval-run \
--path ./eval_results \
--format lm_eval # or "json"
--prefix "eval/" # optional model name prefix
--force # optional: overwrite duplicates
Each immediate subdirectory of --path is treated as one model.
compare — Compare two models
Print a side-by-side metric delta table:
lightml compare \
--db ./registry/main.db \
--model-a llama-base \
--model-b llama-sft \
--run my-experiment # optional
--family "ENG 5-shot" # optional
Output:
Compare: llama-base vs llama-sft
Run: my-experiment
──────────────────────────────────────────────────────────────────────────
Family Metric A B Δ %
──────────────────────────────────────────────────────────────────────────
ENG 5-shot MMLU 52.10 56.20 +4.10 +7.9%
ENG 5-shot ARC 44.30 48.70 +4.40 +9.9%
ENG 5-shot HellaSwag 69.50 71.90 +2.40 +3.5%
──────────────────────────────────────────────────────────────────────────
✅ 3 improved ❌ 0 regressed ➖ 0 unchanged ❓ 0 missing
gui — Launch dashboard
lightml gui --db ./registry/main.db [--port 5050] [--host 0.0.0.0]
Dashboard (GUI)
LightML ships with an interactive web dashboard — no external tools needed.
lightml gui --db ./registry/main.db
Table View
Pivoted metrics table with:
- Family tabs — one tab per metric family, plus "All Families" (properly scoped — same metric name across different families shows distinct values)
- Sorting — click any column header
- Search — filter models by name
- Color coding — best values highlighted in green, worst in red
- Checkpoints toggle — show/hide checkpoint rows
- Run filter — dropdown to isolate a specific run
- Model selection — checkbox column for selecting models
Graph View
D3.js force-directed graph showing model lineage:
- Nodes = models, colored by run
- Edges = parent → child relationships
- Checkpoints hidden by default — toggle "Show checkpoints" in the control bar to reveal them
- Hover = tooltip with green/red dots showing which benchmarks have been evaluated
- Search — filter nodes by name, path, or run
- Drag & zoom — fully interactive
Model Selection & Compare
Select models from either view and compare them side-by-side:
- Select: click checkboxes in the table, or click nodes in the graph — selections sync across both views
- Selection bar: appears at the top showing count and actions
- Filter table: click "Filter table" to show only selected models
- Compare: select exactly 2 models, click "Compare" → a modal shows per-metric deltas with color-coded improvements (green) and regressions (red)
- Clear: reset selection in both views
Excel Export
Click ⬇ Excel in the header to download a formatted .xlsx report directly from the dashboard.
Excel Export
The export engine creates professional Excel reports from the database:
- One sheet per metric family — keeps related metrics grouped
- Automatic color scales — red → yellow → green formatting on all metric columns
- Frozen headers — first row + model name column stay visible while scrolling
- Models (Phase F) and Checkpoints (Phase S) on the same sheet
from pathlib import Path
from lightml.export import export_excel
export_excel(
db_path=Path("./registry/main.db"),
output_path=Path("./report.xlsx"),
)
Or via CLI:
lightml export --db ./registry/main.db --output report.xlsx
Walkthrough: lm_eval Pipeline
This walkthrough shows how to use LightML with lm-evaluation-harness to evaluate an LLM and track results. The complete example is in examples/lm_eval/.
Step 1 — Configure
Edit examples/lm_eval/config.yaml:
# ── LightML settings ──────────────────────────────
db: ./my_registry/main.db
run_name: llama-3-eval
# ── Model to evaluate ────────────────────────────
model_path: meta-llama/Llama-3-8B
# ── Evaluation matrix ────────────────────────────
lang: [eng]
benchmarks: [hellaswag, mmlu]
shots: [0, 5]
num_gpus: 1
Every field is explained inline. The key LightML fields are db (path to registry) and run_name (experiment name).
Step 2 — Run evaluation
cd examples/lm_eval
python run_eval.py
The script does three things:
- Connects to LightML and registers the model (2 lines of setup)
- Runs lm_eval for each (benchmark × language × shots) combination
- Logs every metric to the registry with
handle.log_model_metric()
Here's the core LightML integration — it's just 4 API calls:
from lightml.handle import LightMLHandle
# Setup — 2 lines
handle = LightMLHandle(db=cfg["db"], run_name=cfg["run_name"])
handle.register_model(model_name=cfg["run_name"], path=cfg["model_path"])
# After each benchmark completes — 1 call per metric
handle.log_model_metric(
model_name=handle.run_name,
family="eng_hellaswag_0shot",
metric_name="hellaswag_acc",
value=0.452,
)
Step 3 — Explore in dashboard
lightml gui --db ./my_registry/main.db
Step 4 — Export report
Click ⬇ Excel in the dashboard header, or:
lightml export --db ./my_registry/main.db
Database Schema
LightML uses a single SQLite file with 5 tables:
-- Experiment container
CREATE TABLE run (
id INTEGER PRIMARY KEY AUTOINCREMENT,
run_name TEXT UNIQUE NOT NULL,
description TEXT,
metadata TEXT -- JSON blob
);
-- Trained model, scoped to a run
CREATE TABLE model (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_name TEXT NOT NULL,
path TEXT,
parent_id INTEGER REFERENCES model(id),
run_id INTEGER NOT NULL REFERENCES run(id),
UNIQUE(model_name, run_id)
);
-- Training checkpoint, linked to a model
CREATE TABLE checkpoint (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER NOT NULL REFERENCES model(id),
step INTEGER NOT NULL,
path TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
-- Metric value, linked to a model OR a checkpoint
CREATE TABLE metrics (
id INTEGER PRIMARY KEY AUTOINCREMENT,
model_id INTEGER REFERENCES model(id),
checkpoint_id INTEGER REFERENCES checkpoint(id),
family TEXT NOT NULL,
metric_name TEXT NOT NULL,
value REAL NOT NULL
);
-- Optional: restrict allowed metrics
CREATE TABLE registry_schema (
id INTEGER PRIMARY KEY AUTOINCREMENT,
family TEXT NOT NULL,
metric_name TEXT NOT NULL
);
Project Structure
LightML/
├── pyproject.toml # Package config, CLI entry point
├── README.md # This file
│
├── lightml/ # Library source
│ ├── __init__.py
│ ├── handle.py # LightMLHandle — main API (incl. bulk log_metrics)
│ ├── registry.py # Run & model registration logic
│ ├── checkpoints.py # Checkpoint registration
│ ├── metrics.py # Metric logging + deduplication
│ ├── database.py # SQLite schema initialization
│ ├── export.py # Excel export engine
│ ├── compare.py # Model comparison (Pydantic models + compare_models)
│ ├── scan.py # Auto-import from eval result directories
│ ├── gui.py # FastAPI dashboard server + /api/compare
│ ├── cli.py # CLI entry point (lightml command)
│ ├── models/ # Pydantic schemas
│ ├── templates/
│ │ └── dashboard.html # Single-file SPA dashboard
│ └── tests/
│ ├── test_bugfix.py # Core regression tests (41 tests)
│ ├── test_compare.py # Compare feature tests (15 tests)
│ ├── test_scan.py # Scan / auto-import tests (17 tests)
│ ├── test_bulk.py # Bulk metric API tests (15 tests)
│ └── conftest.py # Shared fixtures
│
├── examples/
│ └── lm_eval/ # End-to-end evaluation example
│ ├── run_eval.py # lm_eval + LightML pipeline
│ └── config.yaml # Example configuration
│
└── docs/
└── gifs/ # GIF recordings for README
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file light_ml_registry-1.0.8.tar.gz.
File metadata
- Download URL: light_ml_registry-1.0.8.tar.gz
- Upload date:
- Size: 51.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f87d9c0426a07829acf74a4ebccfe1f9dcac69a9bc67174eaabe3f9b58d98a29
|
|
| MD5 |
c2f90c2c8a969c726893c9d245e9b587
|
|
| BLAKE2b-256 |
f4669912f7ac2bf034dc95817f1fdcd7330d683d0b01235e298a7a5c3184525a
|
Provenance
The following attestation bundles were made for light_ml_registry-1.0.8.tar.gz:
Publisher:
publish.yml on pierpierpy/LightML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
light_ml_registry-1.0.8.tar.gz -
Subject digest:
f87d9c0426a07829acf74a4ebccfe1f9dcac69a9bc67174eaabe3f9b58d98a29 - Sigstore transparency entry: 1041262933
- Sigstore integration time:
-
Permalink:
pierpierpy/LightML@53f95d2dfddec5932248539ea23fd28a0585eac7 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/pierpierpy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@53f95d2dfddec5932248539ea23fd28a0585eac7 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file light_ml_registry-1.0.8-py3-none-any.whl.
File metadata
- Download URL: light_ml_registry-1.0.8-py3-none-any.whl
- Upload date:
- Size: 60.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e6941079816a8b28b661b139249e5bdc9f9886cabb881f05205173a02de885b6
|
|
| MD5 |
58a60f6a74351d5c61f9fb12d9739176
|
|
| BLAKE2b-256 |
9f41a576d2c3e54127ad00535baea38358eedcc52dd97a9dcb242ae38c793b27
|
Provenance
The following attestation bundles were made for light_ml_registry-1.0.8-py3-none-any.whl:
Publisher:
publish.yml on pierpierpy/LightML
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
light_ml_registry-1.0.8-py3-none-any.whl -
Subject digest:
e6941079816a8b28b661b139249e5bdc9f9886cabb881f05205173a02de885b6 - Sigstore transparency entry: 1041262975
- Sigstore integration time:
-
Permalink:
pierpierpy/LightML@53f95d2dfddec5932248539ea23fd28a0585eac7 -
Branch / Tag:
refs/heads/master - Owner: https://github.com/pierpierpy
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@53f95d2dfddec5932248539ea23fd28a0585eac7 -
Trigger Event:
workflow_dispatch
-
Statement type: