Track how discrete representations evolve during training
Project description
reptimeline
Track how discrete representations evolve during neural network training.
reptimeline monitors lifecycle events in discrete representation systems: when concepts are "born" (first become distinguishable), when they "die" (collapse), when relationships form, and where phase transitions occur. It then discovers what each feature means, labels it, and tests causal effects.
Backend-agnostic: works with triadic bits, VQ-VAE codebooks, FSQ levels, sparse autoencoders, binary codes, or any discrete bottleneck.
Features
- Lifecycle tracking -- birth, death, and connection events for individual code elements across training
- Phase transition detection -- automatic discovery of training regime changes via metric discontinuities
- Bottom-up ontology discovery -- duals, dependencies, 3-way interactions, and hierarchical structure without pre-defined primitives
- Auto-labeling -- three strategies: embedding-based, contrastive, and LLM-based
- Causal verification -- intervention testing with bootstrap CIs, permutation p-values, and BH-FDR correction
- Theory reconciliation -- compare discovered structure against manually-defined domain primitives
- Visualizations -- static (matplotlib) and interactive (Plotly) swimlane, phase dashboard, churn heatmap, layer emergence, causal heatmap
- Export -- JSON round-trip (
save_json/load_json), CSV export (events, curves, codes, stability)
Tech Stack
| Component | Details |
|---|---|
| Language | Python 3.10 -- 3.13 |
| Core dependencies | numpy >= 1.24, matplotlib >= 3.7, tqdm >= 4.60 |
| Optional | torch >= 2.0 (extractors), plotly >= 5.0 (interactive plots) |
| Testing | pytest, pytest-cov, 224 tests |
| Linting | ruff (zero warnings), mypy (zero errors) |
| CI | GitHub Actions (tests + lint + typecheck + coverage) |
| Docs | pdoc, auto-deployed to GitHub Pages |
| License | BUSL-1.1 (converts to AGPL-3.0 on 2030-03-21) |
Installation
pip install reptimeline
From source (for development):
git clone https://github.com/arturoornelasb/reptimeline.git
cd reptimeline
pip install -e ".[dev]"
To run the examples (MNIST, Pythia SAE, causal experiments):
pip install -r requirements-examples.txt
Quick Start
1. Use a built-in extractor or implement your own
Three backends ship ready to use:
from reptimeline.extractors import SAEExtractor, VQVAEExtractor, FSQExtractor
# Sparse Autoencoder (top-k binarization, intervention support)
sae = SAEExtractor(n_features=32768, encode_fn=my_sae.encode,
decode_fn=my_sae.decode, feature_indices=selected)
# VQ-VAE (codebook index → binary indicator)
vqvae = VQVAEExtractor(n_codebook=512, encode_fn=my_vqvae.encode)
# FSQ (finite scalar quantization, nonzero or one-hot binarization)
fsq = FSQExtractor(n_levels=[3, 5, 3, 3], encode_fn=my_fsq.encode)
Or implement RepresentationExtractor for any other discrete bottleneck:
from reptimeline.extractors.base import RepresentationExtractor
from reptimeline.core import ConceptSnapshot
class MyExtractor(RepresentationExtractor):
def extract(self, checkpoint_path, concepts, device='cpu'):
codes = {}
for concept in concepts:
codes[concept] = get_discrete_code(model, concept) # List[int]
return ConceptSnapshot(step=parse_step(checkpoint_path), codes=codes)
def similarity(self, code_a, code_b):
... # Jaccard, Hamming, or domain-specific
def shared_features(self, code_a, code_b):
... # Indices where both codes are active
See examples/ for complete pipelines (MNIST binary AE, Pythia-70M SAE, triadic bits).
2. Analyze representation evolution
from reptimeline import TimelineTracker
extractor = MyExtractor()
snapshots = extractor.extract_sequence("checkpoints/", concepts)
tracker = TimelineTracker(extractor)
timeline = tracker.analyze(snapshots)
timeline.print_summary()
3. Discover what each code element means
from reptimeline import BitDiscovery, AutoLabeler
discovery = BitDiscovery()
report = discovery.discover(snapshots[-1], timeline=timeline)
discovery.print_report(report)
# Auto-label with embeddings (no API needed)
labeler = AutoLabeler()
labels = labeler.label_by_embedding(report, embeddings)
4. Test causal effects
from reptimeline import CausalVerifier
verifier = CausalVerifier(labels)
causal_report = verifier.verify(intervene_fn, concepts)
5. Export results
# JSON round-trip
timeline.save_json("results/timeline.json")
restored = Timeline.load_json("results/timeline.json")
# CSV export (events, curves, codes, stability)
timeline.to_csv("results/csv/")
6. Interactive plots (requires plotly)
from reptimeline.viz.interactive import plot_phase_dashboard_interactive
fig = plot_phase_dashboard_interactive(timeline, save_html="dashboard.html")
7. CLI
reptimeline --snapshots data.json --discover --plot
reptimeline --snapshots data.json --overlay primitivos.json --output result.json
reptimeline --snapshots data.json --causal effects.json --plot-dir plots/
Architecture
Your model checkpoints
|
v
RepresentationExtractor (SAE, VQ-VAE, FSQ built-in, or your own)
| ConceptSnapshot objects
v
TimelineTracker (births, deaths, connections, phase transitions)
|
v
BitDiscovery (duals, dependencies, 3-way interactions, hierarchy)
|
v
AutoLabeler (embedding / contrastive / LLM labeling)
|
v
CausalVerifier (intervention effects + statistical testing)
|
v
Reconciler (compare discovered vs. expected structure)
|
v
Visualizations (swimlane, phase dashboard, churn, causal heatmap)
Validated Results
MNIST Binary Autoencoder (32-bit)
| Metric | Value |
|---|---|
| Decoder determinism | 100% (32-bit code fully determines output; n=100 swaps) |
| Dual pairs discovered | 9 anti-correlated |
| Phase transitions | 3 detected automatically |
| Lifecycle tracking | 6 epochs, 10 digit classes |
Pythia-70M Sparse Autoencoder (32K features)
| Metric | Value |
|---|---|
| Causal selectivity (KL) | 8 features with finite selectivity (1.96x--98.4x, mean 26.8x L2); 8 with zero cross-activation |
| Dual pairs discovered | 34 anti-correlated |
| Lifecycle tracking | 12 checkpoints (step 1 to 143K) |
Causal intervention on Pythia-70M SAE features. Yellow = no effect; dark red = strong effect.
Limitations
- Prediction experiments did not improve over baseline. Using discovered SAE features for next-token prediction produced -0.13% (embedding-based) and -4.20% (MLP-based) accuracy relative to baseline. Features are individually meaningful but do not yet translate to prediction improvements.
- Sentinel features. 8 of 16 tested SAE features showed zero cross-activation, which may reflect SAE sparsity rather than proven causal selectivity. These are reported separately.
- Statistical corrections. Discovery includes Bonferroni and BH-FDR correction. Use
null_baseline()to estimate false positive rates for your data dimensions.
Project Structure
reptimeline/
__init__.py # Public API
__main__.py # python -m reptimeline
core.py # ConceptSnapshot, Timeline, lifecycle events
tracker.py # TimelineTracker
discovery.py # BitDiscovery: bottom-up ontology
autolabel.py # AutoLabeler: 3 labeling strategies
reconcile.py # Reconciler: discovered vs. theory
causal.py # CausalVerifier: intervention testing
exceptions.py # Domain-specific exception hierarchy
stats.py # Bootstrap, permutation tests, BH-FDR
cli.py # Command-line interface
extractors/
base.py # RepresentationExtractor ABC
sae.py # Sparse autoencoder extractor
vqvae.py # VQ-VAE extractor
fsq.py # FSQ extractor
overlays/
primitive_overlay.py # Domain-specific primitive overlay
viz/
swimlane.py # Concept activation swimlane
phase_dashboard.py # Metric trends + phase transitions
churn_heatmap.py # Per-concept code churn
layer_emergence.py # Layer stabilization order (dynamic colors)
causal_heatmap.py # Causal intervention effects
interactive.py # Plotly interactive versions (optional)
tests/ # 18 test modules, 224 tests (pytest)
examples/ # Reference pipelines and extractors
results/ # Pre-computed results (MNIST, Pythia-70M)
Development
# Install with dev deps + pre-commit hooks
pip install -e ".[dev]"
pre-commit install
# Run tests with coverage
pytest tests/ -v --cov=reptimeline
# Lint + type check
ruff check reptimeline/ tests/
mypy reptimeline/
CI runs lint, typecheck, and tests on every push and PR (Python 3.10 -- 3.13).
See CONTRIBUTING.md for contribution guidelines.
License
Business Source License 1.1 (BUSL-1.1)
- Free for research, education, evaluation, development, and personal use
- Commercial production use requires a license -- contact arturoornelas62@gmail.com
- Converts to AGPL-3.0 on 2030-03-21
- All dependencies are commercially compatible (BSD, MIT, Apache-2.0, MPL-2.0 -- zero copyleft)
Citation
@software{ornelas2026reptimeline,
author = {Ornelas Brand, J. Arturo},
title = {reptimeline: Tracking Discrete Representation Evolution During Training},
year = {2026},
url = {https://github.com/arturoornelasb/reptimeline}
}
Origin
Extracted from triadic-microgpt. Paper: "Prime Factorization as a Neurosymbolic Bridge" (Ornelas Brand, J.A., 2026).
Coming from triadic-microgpt? See the migration guide.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reptimeline-0.1.0.tar.gz.
File metadata
- Download URL: reptimeline-0.1.0.tar.gz
- Upload date:
- Size: 71.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a12fd6e58dfdbb8644c4e8fed1466c98f20829ea3039e0b659b3a194879c4ac
|
|
| MD5 |
79e297b1328574ac3e5ff262483a0ee2
|
|
| BLAKE2b-256 |
274e8b4294c39b80fabd812e41ebf330b8035e49af559f9a9db72a7117455b3e
|
Provenance
The following attestation bundles were made for reptimeline-0.1.0.tar.gz:
Publisher:
publish.yml on arturoornelasb/reptimeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reptimeline-0.1.0.tar.gz -
Subject digest:
6a12fd6e58dfdbb8644c4e8fed1466c98f20829ea3039e0b659b3a194879c4ac - Sigstore transparency entry: 1174481089
- Sigstore integration time:
-
Permalink:
arturoornelasb/reptimeline@52129ab6417e37f881349b954a12a12ff398904b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arturoornelasb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@52129ab6417e37f881349b954a12a12ff398904b -
Trigger Event:
release
-
Statement type:
File details
Details for the file reptimeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: reptimeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 59.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7aeccfa6ad20b5846f9d0be8699d3a4f7672ce72c26997faf847544f642fa9e7
|
|
| MD5 |
417831effdca2889d03bff4de9fb8b6a
|
|
| BLAKE2b-256 |
0bc79f7e21b6f16143da18049a8bfb6f341642ac50b1d9647b576b7abf28eaee
|
Provenance
The following attestation bundles were made for reptimeline-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on arturoornelasb/reptimeline
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reptimeline-0.1.0-py3-none-any.whl -
Subject digest:
7aeccfa6ad20b5846f9d0be8699d3a4f7672ce72c26997faf847544f642fa9e7 - Sigstore transparency entry: 1174481106
- Sigstore integration time:
-
Permalink:
arturoornelasb/reptimeline@52129ab6417e37f881349b954a12a12ff398904b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/arturoornelasb
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@52129ab6417e37f881349b954a12a12ff398904b -
Trigger Event:
release
-
Statement type: