Skip to main content

Live Model Memory Inspector โ€” visual debugger for Transformers, SSMs (Mamba), RWKV, and RNNs

Project description

๐Ÿ”ฌ MEMOSCOPE โ€” Live Model Memory Inspector

Real-time visual debugging for Transformers, SSMs (Mamba), RWKV, and RNNs. Watch your model's memory live โ€” drift, decay, collapse, all in one dashboard.

Python 3.10+ PyTorch FastAPI License: MIT Stars

  __  __ _____ __  __  ___  ____   ____ ___  ____  _____
 |  \/  | ____|  \/  |/ _ \/ ___| / ___/ _ \|  _ \| ____|
 | |\/| |  _| | |\/| | | | \___ \| |  | | | | |_) |  _|
 | |  | | |___| |  | | |_| |___) | |__| |_| |  __/| |___
 |_|  |_|_____|_|  |_|\___/|____/ \____\___/|_|   |_____|

 Live Model Memory Inspector  ยท  v0.1.0

What is MEMOSCOPE?

When your LLM starts forgetting earlier tokens, produces incoherent long outputs, or silently collapses its context representation โ€” you won't see it in loss curves.

MEMOSCOPE gives you a live window into the model's memory:

What you see What it means
๐ŸŒŠ Hidden State Drift How much the internal representation changed after each new token
๐Ÿง  Token Retention Heatmap Which early tokens the model is still "attending to"
๐Ÿ“‰ Memory Decay Curve Exponential fall-off of early-token attention mass
โšก Context Collapse Score When entropy spikes โ€” the model has run out of useful context

Quickstart (zero configuration)

# Install
pip install memoscope

# Run the demo โ€” opens browser automatically, no GPU needed
memoscope

# Try different architectures
memoscope --model mamba       # State Space Model
memoscope --model rnn         # LSTM

Or from source:

git clone https://github.com/your-org/memoscope
cd memoscope
pip install -e .
python app.py

One-liner API:

from memoscope import inspect_memory
inspect_memory()   # that's it โ€” browser opens, data flows

Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        YOUR MODEL                            โ”‚
โ”‚                                                              โ”‚
โ”‚  token_1 โ”€โ”€โ–ถ [Layer 0] โ”€โ”€โ–ถ [Layer 1] โ”€โ”€โ–ถ ... โ”€โ”€โ–ถ logits    โ”‚
โ”‚                   โ”‚              โ”‚                           โ”‚
โ”‚               hook_0         hook_1    โ† zero-overhead       โ”‚
โ”‚                   โ”‚              โ”‚       forward hooks       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                    โ”‚              โ”‚
                    โ–ผ              โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚      MemoryInspector        โ”‚
          โ”‚                             โ”‚
          โ”‚  โ€ข cosine drift  D(t)       โ”‚
          โ”‚  โ€ข layer norms   โ€–h_tโ€–โ‚‚    โ”‚
          โ”‚  โ€ข attention entropy  H     โ”‚
          โ”‚  โ€ข token retention          โ”‚
          โ”‚  โ€ข collapse score           โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚  asyncio.Queue
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚   FastAPI + WebSocket       โ”‚
          โ”‚   server (uvicorn)          โ”‚
          โ”‚                             โ”‚
          โ”‚   GET  /          โ†’ SPA     โ”‚
          โ”‚   GET  /history   โ†’ replay  โ”‚
          โ”‚   WS   /ws        โ†’ live    โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                         โ”‚  JSON frames
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚   Browser Dashboard         โ”‚
          โ”‚                             โ”‚
          โ”‚   Chart.js   live charts    โ”‚
          โ”‚   CSS grid   heatmap        โ”‚
          โ”‚   TailwindCSS  dark UI      โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Metrics โ€” The Math

1. Hidden State Drift

How much does the model's internal representation change after each new token?

$$D(t) = 1 - \frac{h_t \cdot h_{t-1}}{|h_t| \cdot |h_{t-1}|}$$

  • D(t) = 0 โ†’ state identical to previous step (stagnation)
  • D(t) = 1 โ†’ fully orthogonal (healthy exploration)
  • D(t) = 2 โ†’ state has reversed (instability / collapse)

2. Attention Entropy

How "spread out" is attention across tokens?

$$H(t) = -\sum_{i} p_i \log(p_i + \varepsilon), \quad \text{normalised by} \log(T)$$

Low entropy = model attending to only a few tokens = memory narrowing.

3. Context Collapse Score

A composite indicator of state saturation:

$$\text{collapse}(t) = 0.5 \times (1 - \bar{H}) + 0.5 \times \text{CV}(|h|)$$

where CV is the coefficient of variation of layer norms.
Score > 0.55 โ†’ warning. Score > 0.78 โ†’ critical.

4. Token Retention

Column sums of the attention matrix, normalised:

$$r_j = \frac{\sum_i A_{ij}}{\sum_{i,j} A_{ij}}$$

Plotted as a bar chart: early positions (j=0,1,2โ€ฆ) should retain mass in healthy models. Flat curves near zero = the model has forgotten everything.


API Reference

inspect_memory()

from memoscope import inspect_memory

inspector = inspect_memory(
    model=None,           # nn.Module or None (uses mock)
    data_stream=None,     # Iterable[Tensor] or None (uses synthetic)
    host="127.0.0.1",
    port=8765,
    open_browser=True,
    mock_model_type="transformer",  # "transformer" | "mamba" | "rnn"
    stream_delay=0.15,    # seconds between steps
)

MemoryInspector

from memoscope import MemoryInspector
import torch.nn as nn

model = MyModel()
inspector = MemoryInspector(model)

# Run inference manually
output = model(input_ids)
snapshot = inspector.step()   # compute metrics for this step

print(snapshot.mean_drift)
print(snapshot.collapse_score)
print(snapshot.layer_norms)

# Detach hooks when done
inspector.detach()

StepSnapshot fields

Field Type Description
step int Global token position
layer_norms List[float] L2 norm per layer
layer_drift List[float] Cosine drift per layer
token_retention List[List[float]] Attention retention per layer
attention_entropy List[float] Shannon entropy per layer
collapse_score float Context collapse indicator [0,1]
mean_drift float Average drift across layers
model_type str "transformer" / "mamba" / "rnn"
seq_len int Current context window length

Mock Models

from memoscope import get_mock_model, MockTransformer, MockMamba, MockRNN

# Factory
model = get_mock_model("mamba")

# Direct instantiation with custom config
model = MockTransformer(
    vocab_size=512,
    d_model=256,
    num_heads=4,
    num_layers=4,
    max_seq=128,
)

model = MockMamba(
    vocab_size=512,
    d_model=128,
    d_state=16,
    num_layers=4,
)

model = MockRNN(
    vocab_size=512,
    d_model=256,
    hidden_size=256,
    num_layers=3,
)

Synthetic Token Stream

from memoscope.core.mock_models import synthetic_token_stream

stream = synthetic_token_stream(
    vocab_size=512,
    seq_len=1024,
    batch_size=1,
    device="cpu",
)

for token_batch in stream:   # shape: [1, t+1]
    output = model(token_batch)

Attach to a Real HuggingFace Model

from transformers import GPT2LMHeadModel
from memoscope import MemoryInspector, inspect_memory

# Load any HF model
model = GPT2LMHeadModel.from_pretrained("gpt2")
model.eval()

# Option A: full dashboard
inspect_memory(model, mock_model_type="transformer")

# Option B: programmatic access only
inspector = MemoryInspector(model, model_type="transformer")

import torch
input_ids = torch.randint(0, 50257, (1, 32))
with torch.no_grad():
    _ = model(input_ids, output_attentions=True)

snapshot = inspector.step()
print(f"Drift: {snapshot.mean_drift:.4f}")
print(f"Collapse risk: {snapshot.collapse_score:.4f}")

CLI Reference

Usage: memoscope [OPTIONS]

Options:
  --model TEXT       transformer | mamba | rnn | ssm | lstm  [default: transformer]
  --host TEXT        Server host                              [default: 127.0.0.1]
  --port INTEGER     Server port                              [default: 8765]
  --delay FLOAT      Seconds between steps                    [default: 0.15]
  --seq-len INTEGER  Tokens to stream                         [default: 512]
  --no-browser       Skip auto-opening browser
  --help             Show this message and exit.

File Structure

memoscope/
โ”œโ”€โ”€ app.py                          โ† Demo launcher (run this)
โ”œโ”€โ”€ pyproject.toml                  โ† Package metadata & deps
โ”œโ”€โ”€ README.md                       โ† You are here
โ”‚
โ””โ”€โ”€ memoscope/
    โ”œโ”€โ”€ __init__.py                 โ† Public API: inspect_memory()
    โ”œโ”€โ”€ __main__.py                 โ† python -m memoscope
    โ”œโ”€โ”€ cli.py                      โ† CLI entry point
    โ”‚
    โ”œโ”€โ”€ core/
    โ”‚   โ”œโ”€โ”€ hooks.py                โ† PyTorch hook engine + metric math
    โ”‚   โ””โ”€โ”€ mock_models.py          โ† Transformer / Mamba / RNN mocks
    โ”‚
    โ””โ”€โ”€ server/
        โ”œโ”€โ”€ app.py                  โ† FastAPI + WebSocket server
        โ””โ”€โ”€ templates/
            โ””โ”€โ”€ index.html          โ† SPA dashboard (Chart.js + Tailwind)

Dashboard Panels

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  MEMOSCOPE  [TRANSFORMER]              step: 247   drift: 0.183โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Context Len  โ”‚ Mem Entropy  โ”‚   Collapse Score                 โ”‚
โ”‚    247       โ”‚   0.612      โ”‚   โ–ˆโ–ˆโ–ˆโ–ˆโ–‘โ–‘โ–‘โ–‘โ–‘โ–‘ 0.321               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Hidden State Drift  D(t) = 1 - cos(h_t ยท h_{t-1})            โ”‚
โ”‚  ~~~~~~~~~~~~~~~~~~~~~~/\/\~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~โ”‚
โ”‚  _____________________/    \___________________________________โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Attention Entropyโ”‚  Layer Norms    โ”‚  Memory Decay             โ”‚
โ”‚ per layer (live) โ”‚  L00 โ–ˆโ–ˆโ–ˆโ–ˆ  4.21 โ”‚  โ–‡โ–…โ–ƒโ–‚โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–  โ”‚
โ”‚                  โ”‚  L01 โ–ˆโ–ˆโ–ˆ   3.89 โ”‚  token position โ†’         โ”‚
โ”‚                  โ”‚  L02 โ–ˆโ–ˆ    2.14 โ”‚                           โ”‚
โ”‚                  โ”‚  L03 โ–ˆ     1.02 โ”‚                           โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  Token Retention Heatmap (layer ร— token position)              โ”‚
โ”‚  โ–‘โ–‘โ–’โ–’โ–“โ–“โ–ˆโ–ˆโ–ˆโ–ˆโ–“โ–’โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  โ† step 230               โ”‚
โ”‚  โ–‘โ–‘โ–‘โ–‘โ–’โ–’โ–“โ–“โ–ˆโ–ˆโ–“โ–’โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘โ–‘  โ† step 231               โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€  โ”ค
โ”‚  [SYS] MEMOSCOPE v0.1.0 bootingโ€ฆ                               โ”‚
โ”‚  [NET] WebSocket connected                                      โ”‚
โ”‚  [INFO] Step 200 โ€” drift 0.183 โ€” entropy 0.612                 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Interpreting the Signals

Hidden State Drift โ€” What to look for

Pattern Interpretation
Stable low drift (~0.05โ€“0.15) Model in steady auto-regressive rhythm
Periodic spikes Semantic boundaries (sentence ends, topic shifts)
Sustained high drift (>0.8) Unstable generation / degenerate outputs
Drift โ†’ 0 plateau State has frozen โ€” model ignoring new input

Context Collapse โ€” When to worry

Score Status Meaning
0.00โ€“0.54 โœ… Healthy Normal operation
0.55โ€“0.77 โš ๏ธ Warning Context narrowing โ€” watch entropy
0.78โ€“1.00 ๐Ÿ”ด Critical Likely generating garbage

Memory Decay โ€” Architecture differences

Transformer (causal):       RNN/LSTM:              Mamba/SSM:
โ–‡โ–…โ–„โ–ƒโ–‚โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–          โ–‡โ–„โ–‚โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–      โ–‡โ–†โ–…โ–…โ–„โ–„โ–ƒโ–ƒโ–‚โ–‚โ–โ–โ–โ–โ–โ–โ–
Soft decay via               Hard exponential        Selective retention โ€”
  softmax dilution            forgetting             learned decay rate

Performance Notes

  • No GPU required. All mock models run on CPU in <5ms per step.
  • Zero model modification. Hooks are read-only; they never affect gradients or outputs.
  • Memory overhead. ~20MB for the inspector + rolling buffer of 512 snapshots.
  • Real models. Hook overhead on GPT-2 (124M) is under 0.3ms per step.
  • Thread safety. All state transitions are protected by threading.Lock.

Roadmap

  • HuggingFace model auto-detection (parse config.json)
  • RWKV-specific WKV state inspector
  • Export snapshots to Parquet / W&B
  • Gradient-weighted attention maps (GRAD-CAM style)
  • Multi-model side-by-side comparison view
  • Alerting webhooks (Slack / Discord) on collapse events
  • Plugin system for custom metrics

Contributing

git clone https://github.com/your-org/memoscope
cd memoscope
pip install -e ".[dev]"
ruff check .
pytest tests/

Pull requests welcome. See CONTRIBUTING.md.


Citation

@software{memoscope2024,
  title  = {MEMOSCOPE: Live Model Memory Inspector},
  year   = {2024},
  url    = {https://github.com/your-org/memoscope},
}

License

MIT โ€” see LICENSE.


Built for the curious minds who want to see inside the model, not just its outputs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

memoscope-0.1.0.tar.gz (36.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

memoscope-0.1.0-py3-none-any.whl (30.3 kB view details)

Uploaded Python 3

File details

Details for the file memoscope-0.1.0.tar.gz.

File metadata

  • Download URL: memoscope-0.1.0.tar.gz
  • Upload date:
  • Size: 36.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for memoscope-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f207b57285598abe5018491a5cbeff2505fb70e325551a75cf1b97286838511c
MD5 a296a543438b91f2b89f5cefd9bf9bb5
BLAKE2b-256 f4496836f02166017ae880c81eb707fa9c28b922615dfdb50a8625a1ddb48054

See more details on using hashes here.

File details

Details for the file memoscope-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: memoscope-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 30.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for memoscope-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8e010dc978fffa48ce59401e7cb8d9fc61dd7cf9d50104ca3976f02a20ce018c
MD5 7a84041d21cd6df9a90410d6fffaf200
BLAKE2b-256 a1eb3df00c8c5484efa899e37d8d4bed90e53696c9a976f6f8222c2fac07b32f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page