Live Model Memory Inspector โ visual debugger for Transformers, SSMs (Mamba), RWKV, and RNNs
Project description
๐ฌ MEMOSCOPE โ Live Model Memory Inspector
Real-time visual debugging for Transformers, SSMs (Mamba), RWKV, and RNNs. Watch your model's memory live โ drift, decay, collapse, all in one dashboard.
__ __ _____ __ __ ___ ____ ____ ___ ____ _____
| \/ | ____| \/ |/ _ \/ ___| / ___/ _ \| _ \| ____|
| |\/| | _| | |\/| | | | \___ \| | | | | | |_) | _|
| | | | |___| | | | |_| |___) | |__| |_| | __/| |___
|_| |_|_____|_| |_|\___/|____/ \____\___/|_| |_____|
Live Model Memory Inspector ยท v0.1.0
What is MEMOSCOPE?
When your LLM starts forgetting earlier tokens, produces incoherent long outputs, or silently collapses its context representation โ you won't see it in loss curves.
MEMOSCOPE gives you a live window into the model's memory:
| What you see | What it means |
|---|---|
| ๐ Hidden State Drift | How much the internal representation changed after each new token |
| ๐ง Token Retention Heatmap | Which early tokens the model is still "attending to" |
| ๐ Memory Decay Curve | Exponential fall-off of early-token attention mass |
| โก Context Collapse Score | When entropy spikes โ the model has run out of useful context |
Quickstart (zero configuration)
# Install
pip install memoscope
# Run the demo โ opens browser automatically, no GPU needed
memoscope
# Try different architectures
memoscope --model mamba # State Space Model
memoscope --model rnn # LSTM
Or from source:
git clone https://github.com/your-org/memoscope
cd memoscope
pip install -e .
python app.py
One-liner API:
from memoscope import inspect_memory
inspect_memory() # that's it โ browser opens, data flows
Architecture
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ YOUR MODEL โ
โ โ
โ token_1 โโโถ [Layer 0] โโโถ [Layer 1] โโโถ ... โโโถ logits โ
โ โ โ โ
โ hook_0 hook_1 โ zero-overhead โ
โ โ โ forward hooks โ
โโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MemoryInspector โ
โ โ
โ โข cosine drift D(t) โ
โ โข layer norms โh_tโโ โ
โ โข attention entropy H โ
โ โข token retention โ
โ โข collapse score โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ asyncio.Queue
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ FastAPI + WebSocket โ
โ server (uvicorn) โ
โ โ
โ GET / โ SPA โ
โ GET /history โ replay โ
โ WS /ws โ live โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ JSON frames
โโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโ
โ Browser Dashboard โ
โ โ
โ Chart.js live charts โ
โ CSS grid heatmap โ
โ TailwindCSS dark UI โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Metrics โ The Math
1. Hidden State Drift
How much does the model's internal representation change after each new token?
$$D(t) = 1 - \frac{h_t \cdot h_{t-1}}{|h_t| \cdot |h_{t-1}|}$$
D(t) = 0โ state identical to previous step (stagnation)D(t) = 1โ fully orthogonal (healthy exploration)D(t) = 2โ state has reversed (instability / collapse)
2. Attention Entropy
How "spread out" is attention across tokens?
$$H(t) = -\sum_{i} p_i \log(p_i + \varepsilon), \quad \text{normalised by} \log(T)$$
Low entropy = model attending to only a few tokens = memory narrowing.
3. Context Collapse Score
A composite indicator of state saturation:
$$\text{collapse}(t) = 0.5 \times (1 - \bar{H}) + 0.5 \times \text{CV}(|h|)$$
where CV is the coefficient of variation of layer norms.
Score > 0.55 โ warning. Score > 0.78 โ critical.
4. Token Retention
Column sums of the attention matrix, normalised:
$$r_j = \frac{\sum_i A_{ij}}{\sum_{i,j} A_{ij}}$$
Plotted as a bar chart: early positions (j=0,1,2โฆ) should retain mass in healthy models. Flat curves near zero = the model has forgotten everything.
API Reference
inspect_memory()
from memoscope import inspect_memory
inspector = inspect_memory(
model=None, # nn.Module or None (uses mock)
data_stream=None, # Iterable[Tensor] or None (uses synthetic)
host="127.0.0.1",
port=8765,
open_browser=True,
mock_model_type="transformer", # "transformer" | "mamba" | "rnn"
stream_delay=0.15, # seconds between steps
)
MemoryInspector
from memoscope import MemoryInspector
import torch.nn as nn
model = MyModel()
inspector = MemoryInspector(model)
# Run inference manually
output = model(input_ids)
snapshot = inspector.step() # compute metrics for this step
print(snapshot.mean_drift)
print(snapshot.collapse_score)
print(snapshot.layer_norms)
# Detach hooks when done
inspector.detach()
StepSnapshot fields
| Field | Type | Description |
|---|---|---|
step |
int |
Global token position |
layer_norms |
List[float] |
L2 norm per layer |
layer_drift |
List[float] |
Cosine drift per layer |
token_retention |
List[List[float]] |
Attention retention per layer |
attention_entropy |
List[float] |
Shannon entropy per layer |
collapse_score |
float |
Context collapse indicator [0,1] |
mean_drift |
float |
Average drift across layers |
model_type |
str |
"transformer" / "mamba" / "rnn" |
seq_len |
int |
Current context window length |
Mock Models
from memoscope import get_mock_model, MockTransformer, MockMamba, MockRNN
# Factory
model = get_mock_model("mamba")
# Direct instantiation with custom config
model = MockTransformer(
vocab_size=512,
d_model=256,
num_heads=4,
num_layers=4,
max_seq=128,
)
model = MockMamba(
vocab_size=512,
d_model=128,
d_state=16,
num_layers=4,
)
model = MockRNN(
vocab_size=512,
d_model=256,
hidden_size=256,
num_layers=3,
)
Synthetic Token Stream
from memoscope.core.mock_models import synthetic_token_stream
stream = synthetic_token_stream(
vocab_size=512,
seq_len=1024,
batch_size=1,
device="cpu",
)
for token_batch in stream: # shape: [1, t+1]
output = model(token_batch)
Attach to a Real HuggingFace Model
from transformers import GPT2LMHeadModel
from memoscope import MemoryInspector, inspect_memory
# Load any HF model
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()
# Option A: full dashboard
inspect_memory(model, mock_model_type="transformer")
# Option B: programmatic access only
inspector = MemoryInspector(model, model_type="transformer")
import torch
input_ids = torch.randint(0, 50257, (1, 32))
with torch.no_grad():
_ = model(input_ids, output_attentions=True)
snapshot = inspector.step()
print(f"Drift: {snapshot.mean_drift:.4f}")
print(f"Collapse risk: {snapshot.collapse_score:.4f}")
CLI Reference
Usage: memoscope [OPTIONS]
Options:
--model TEXT transformer | mamba | rnn | ssm | lstm [default: transformer]
--host TEXT Server host [default: 127.0.0.1]
--port INTEGER Server port [default: 8765]
--delay FLOAT Seconds between steps [default: 0.15]
--seq-len INTEGER Tokens to stream [default: 512]
--no-browser Skip auto-opening browser
--help Show this message and exit.
File Structure
memoscope/
โโโ app.py โ Demo launcher (run this)
โโโ pyproject.toml โ Package metadata & deps
โโโ README.md โ You are here
โ
โโโ memoscope/
โโโ __init__.py โ Public API: inspect_memory()
โโโ __main__.py โ python -m memoscope
โโโ cli.py โ CLI entry point
โ
โโโ core/
โ โโโ hooks.py โ PyTorch hook engine + metric math
โ โโโ mock_models.py โ Transformer / Mamba / RNN mocks
โ
โโโ server/
โโโ app.py โ FastAPI + WebSocket server
โโโ templates/
โโโ index.html โ SPA dashboard (Chart.js + Tailwind)
Dashboard Panels
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MEMOSCOPE [TRANSFORMER] step: 247 drift: 0.183โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Context Len โ Mem Entropy โ Collapse Score โ
โ 247 โ 0.612 โ โโโโโโโโโโ 0.321 โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Hidden State Drift D(t) = 1 - cos(h_t ยท h_{t-1}) โ
โ ~~~~~~~~~~~~~~~~~~~~~~/\/\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~โ
โ _____________________/ \___________________________________โ
โโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Attention Entropyโ Layer Norms โ Memory Decay โ
โ per layer (live) โ L00 โโโโ 4.21 โ โโ
โโโโโโโโโโโโโโโโโโ โ
โ โ L01 โโโ 3.89 โ token position โ โ
โ โ L02 โโ 2.14 โ โ
โ โ L03 โ 1.02 โ โ
โโโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Token Retention Heatmap (layer ร token position) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ step 230 โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ step 231 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โค
โ [SYS] MEMOSCOPE v0.1.0 bootingโฆ โ
โ [NET] WebSocket connected โ
โ [INFO] Step 200 โ drift 0.183 โ entropy 0.612 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Interpreting the Signals
Hidden State Drift โ What to look for
| Pattern | Interpretation |
|---|---|
| Stable low drift (~0.05โ0.15) | Model in steady auto-regressive rhythm |
| Periodic spikes | Semantic boundaries (sentence ends, topic shifts) |
| Sustained high drift (>0.8) | Unstable generation / degenerate outputs |
| Drift โ 0 plateau | State has frozen โ model ignoring new input |
Context Collapse โ When to worry
| Score | Status | Meaning |
|---|---|---|
| 0.00โ0.54 | โ Healthy | Normal operation |
| 0.55โ0.77 | โ ๏ธ Warning | Context narrowing โ watch entropy |
| 0.78โ1.00 | ๐ด Critical | Likely generating garbage |
Memory Decay โ Architecture differences
Transformer (causal): RNN/LSTM: Mamba/SSM:
โโ
โโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโ โโโ
โ
โโโโโโโโโโโโโ
Soft decay via Hard exponential Selective retention โ
softmax dilution forgetting learned decay rate
Performance Notes
- No GPU required. All mock models run on CPU in <5ms per step.
- Zero model modification. Hooks are read-only; they never affect gradients or outputs.
- Memory overhead. ~20MB for the inspector + rolling buffer of 512 snapshots.
- Real models. Hook overhead on GPT-2 (124M) is under 0.3ms per step.
- Thread safety. All state transitions are protected by
threading.Lock.
Roadmap
- HuggingFace model auto-detection (parse
config.json) - RWKV-specific WKV state inspector
- Export snapshots to Parquet / W&B
- Gradient-weighted attention maps (GRAD-CAM style)
- Multi-model side-by-side comparison view
- Alerting webhooks (Slack / Discord) on collapse events
- Plugin system for custom metrics
Contributing
git clone https://github.com/your-org/memoscope
cd memoscope
pip install -e ".[dev]"
ruff check .
pytest tests/
Pull requests welcome. See CONTRIBUTING.md.
Citation
@software{memoscope2024,
title = {MEMOSCOPE: Live Model Memory Inspector},
year = {2024},
url = {https://github.com/your-org/memoscope},
}
License
MIT โ see LICENSE.
Built for the curious minds who want to see inside the model, not just its outputs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file memoscope-0.1.0.tar.gz.
File metadata
- Download URL: memoscope-0.1.0.tar.gz
- Upload date:
- Size: 36.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f207b57285598abe5018491a5cbeff2505fb70e325551a75cf1b97286838511c
|
|
| MD5 |
a296a543438b91f2b89f5cefd9bf9bb5
|
|
| BLAKE2b-256 |
f4496836f02166017ae880c81eb707fa9c28b922615dfdb50a8625a1ddb48054
|
File details
Details for the file memoscope-0.1.0-py3-none-any.whl.
File metadata
- Download URL: memoscope-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e010dc978fffa48ce59401e7cb8d9fc61dd7cf9d50104ca3976f02a20ce018c
|
|
| MD5 |
7a84041d21cd6df9a90410d6fffaf200
|
|
| BLAKE2b-256 |
a1eb3df00c8c5484efa899e37d8d4bed90e53696c9a976f6f8222c2fac07b32f
|