Scientific instrumentation for LLM inference memory trace collection and MRM research

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

mrm-trace

A Python research package for collecting, parsing, labelling, and analysing LLM inference memory access traces. Designed as scientific instrumentation for Managed-Retention Memory (MRM) research - it characterises how model weights, KV cache, activations, and runtime allocations are actually accessed during inference.

Primary metrics: retention duration · write-once ratio · read frequency · working set size

Install from PyPI

pip install mrm-trace

Linux (or WSL2) is required for perf mem collection. See Requirements below.

Requirements

Requirement	Notes
Linux (WSL2 supported)	memray works everywhere; `perf mem` requires bare-metal or PMU-capable VM
Python ≥ 3.11	Tested on 3.11 and 3.12
sudo / root	Required for `native_traces=True` (memray) and `perf mem`

Collector capability by environment

Environment	Best collector	`region_map`	Timestamps	Cache level
WSL2 (non-root)	`memray`	empty	0 (no HW timestamps)	n/a
WSL2 as root / sudo	`memray --native-traces`	empty †	0 (memray limitation)	n/a
Bare-metal Linux (root)	`perf mem`	populated	nanoseconds	L1/L2/L3/DRAM
Cloud VM with PMU passthrough	`perf mem`	populated	nanoseconds	L1/L2/L3/DRAM

WSL2 note: The Microsoft WSL2 kernel (*-microsoft-standard-WSL2) does not expose hardware PMU counters to the guest. perf mem requires hardware PMU — it will not produce data on WSL2. Use memray with native_traces=True (sudo) for WSL2 development. For publication-quality retention and cache-level data, run on bare-metal Linux.

native_traces note: memray's native_traces=True captures C-level allocations from llama.cpp/llama-cpp-python. In theory this makes ggml_init and llama_kv_cache_update symbols visible to the labeller, but pip-installed llama-cpp-python strips C symbols — so region_map will still be empty (†). Populating region_map via memray requires a debug-symbol build of llama-cpp-python (CMAKE_BUILD_TYPE=Debug pip install llama-cpp-python). The reliable path is perf mem on bare-metal Linux. native_traces=True requires root or CAP_SYS_PTRACE.

Timestamps note: memray does not record per-allocation timestamps — timestamp_ns is always 0. Retention duration (retention_p99_s) will be 0 in all memray runs. Only perf mem provides real nanosecond timestamps.

Install

# Clone and set up a virtual environment
git clone https://github.com/DhiSys-AI/MRM-Trace
cd MRM-Trace
python -m venv venv
source venv/bin/activate      # Windows WSL: same command

# Install package + test dependencies
pip install -e ".[test]"

# Optional: install matplotlib/seaborn for figures
pip install -e ".[test,plots]"

Quick start

# Validate a config file
mrm-trace validate --config config/default_experiment.yaml

# Preview what a run would do (dry run)
mrm-trace plan --config config/default_experiment.yaml

# Run a full experiment (requires model files + sudo for perf)
mrm-trace run --config config/default_experiment.yaml

Live demo scripts

End-to-end scripts that run real inference against small models and write all mrm-trace artifacts to a timestamped results directory. Located in notebooks/scripts/.

Setup

# From the repo root (WSL2 or Linux)
source venv/bin/activate
pip install -e ".[test]"
pip install memray

TinyLlama 1.1B (llama-cpp-python + GGUF)

# Install backend
pip install llama-cpp-python \
  --extra-index-url https://abetlen.github.io/llama-cpp-python/whl/cpu

# Download model (~670 MB, one-time)
mkdir -p models
wget -P models/ https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v1.0-GGUF/resolve/main/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

# Run (non-root — Python-level trace only)
python notebooks/scripts/demo_tinyllama.py \
  --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

# Run as root — enables native_traces=True (C-level symbols, populated region_map)
sudo -E python notebooks/scripts/demo_tinyllama.py \
  --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf \
  --native-traces

Qwen2.5-0.5B-Instruct (transformers, no GGUF needed)

# Install backend (model auto-downloads from HuggingFace, ~1 GB)
pip install transformers torch accelerate

python notebooks/scripts/demo_qwen_hf.py

# Larger variant
python notebooks/scripts/demo_qwen_hf.py --model Qwen/Qwen2.5-1.5B-Instruct

perf mem (bare-metal Linux only)

# Requires root and hardware PMU — does NOT work on WSL2
sudo -E python notebooks/scripts/demo_perf_mem.py \
  --model models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf

Script options

Flag	Default	Description
`--model PATH`	`models/tinyllama-*.gguf`	Path to GGUF file
`--ctx N`	2048	Context length
`--tokens N`	128	Max output tokens per prompt
`--out DIR`	`results/`	Output base directory
`--native-traces`	auto (root check)	Force `native_traces=True` for memray
`--no-native-traces`	—	Force `native_traces=False`

All scripts run 5 real prompts and write trace.parquet, region_map.parquet, kv_block_lifecycle.parquet, metrics.csv, metadata.json, and manifest.json to results/<model_id>/<run_id>/.

Notebooks

Notebook	Description	Run on Colab
001 - Getting Started	Install, synthetic trace, label, analyse, export, schema versioning, validity
002 - YAML Config & Experiment Planning	Write & validate configs, sweep expansion, multi-model runs, collector tuning
003 - Real Collection Walkthrough	Real memray capture, parse raw trace, understand symbols, real-model guide

All three notebooks run without root or model files (001 and 002 use synthetic data; 003 uses memray on a simulated workload). They are a good first stop for new contributors and researchers.

Running tests

# Every commit - fast, no I/O
pytest -m unit

# Pre-merge - includes integration tests
pytest -m "unit or integration"

# Before dataset release - scientific correctness checks
pytest -m validity

# Property-based invariant tests (Hypothesis)
pytest tests/property/

# Performance benchmarks (excluded from default run)
pytest -m performance --benchmark-only

# Full suite (excludes slow + performance)
pytest

The test suite has three tiers:

Tier	Marker	Purpose
1	`unit`	Individual functions behave correctly
2	`integration`	Components work together
3	`validity`	Measurements are scientifically sound

Tier-3 validity tests are the most important: they verify that known synthetic inputs produce known metric outputs (e.g. a 30s weight retention window must yield retention_p99_s ≈ 30.0).

Output layout

Each run writes to results/<model_id>/<run_id>/:

results/llama-7b/run_20240101_120000/
├── trace.parquet                  ← labelled memory access trace
├── region_map.parquet             ← one row per region (weight, kv_cache, …)
├── kv_block_lifecycle.parquet     ← per-block write / read / eviction timestamps
├── metrics.csv                    ← per-region-type summary (human-readable)
├── metadata.json                  ← hardware, software, observer effect, run validity
├── manifest.json                  ← SHA-256 checksums for all files
└── raw/
    ├── perf.data
    ├── perf_script.txt
    └── memray.bin

Run validity classification

Every run is automatically classified based on observer overhead:

Class	Criteria
`clean`	observer CPU < 10 %, observer mem < 5 % of target RSS, no throttle, baseline CPU < 15 %
`marginal`	observer CPU < 20 %, observer mem < 15 % of target RSS, ≤ 2 throttle events
`contaminated`	anything worse than marginal

Contaminated runs are archived but excluded from aggregated metrics and paper figures.

Architecture

mrm_trace/
├── cli.py              CLI (typer)
├── api.py              Python API (Experiment class)
├── schema_version.py   Schema version registry and compatibility checking
├── engines/            llama.cpp / vLLM wrappers
├── collector/          perf mem / memray / process_monitor
├── parser/             perf script + memray parsers → trace.parquet
├── labeller/           symbol + address-range region classification
├── analyser/           retention / write-once / read-freq / working-set / IAI / suitability
├── telemetry/          baseline capture / thermal / observer effect / validity classifier
├── reporter/           CSV + Parquet export / figures / manifest / RunExporter
└── utils/              logging / IDs / file helpers

Key design decisions:

Streaming parser - generators throughout; never loads full trace into RAM (ADR-2)
Phase-aware tracing - weight_load / generation / teardown phases distinguish weight from KV (ADR-3)
Observer effect as mandatory output - every run records overhead and validity class (ADR-4)
Parquet + zstd - column-oriented, ~3× better compression than gzip (ADR-8)

MRM suitability labels

Label	Criteria
`high_mrm`	write-once ratio ≥ 0.8 and retention p99 ≥ 10 s
`medium_mrm`	write-once ratio ≥ 0.5 and retention p50 ≥ 1 s
`low_mrm`	everything else

In practice: model weights → high_mrm, short-lived KV blocks → low_mrm.

Schema versioning

All output files carry a mrm_trace_schema_version in their Parquet metadata. The version registry is in mrm_trace/schema_version.py. Readers validate major-version compatibility on load; a major bump is a breaking change.

from mrm_trace.schema_version import check_parquet_schema
check_parquet_schema("results/.../trace.parquet", "trace")  # raises on incompatibility

Python API

from mrm_trace.labeller import TraceLabeller
from mrm_trace.analyser import compute_all
from mrm_trace.reporter import RunExporter

# Label a stream of raw trace rows
labeller = TraceLabeller()
labelled = list(labeller.label(raw_rows))
region_map   = labeller.region_map()    # call after consuming label()
kv_lifecycle = labeller.kv_lifecycle()

# Analyse
import pandas as pd
trace = pd.DataFrame(labelled)
results = compute_all(trace)
# results keys: retention_per_region, retention_summary, write_once,
#               read_freq, working_set_per_region, working_set_summary,
#               locality_per_region, locality_summary, iai, suitability

# Export a publication-ready run directory
exporter = RunExporter("results/llama-7b/run_001")
exporter.export(trace, region_map, kv_lifecycle, results,
                metadata={"run_id": "run_001"}, run_id="run_001")

Collector hierarchy

perf mem — primary; requires Linux PMU + root; bare-metal or PMU-capable VM only; does not work on WSL2
memray — fallback; Python-level allocations (no root) or C-level (root + native_traces=True); works everywhere
process_monitor — always runs in parallel as coarse RSS/CPU baseline (psutil)

See Collector capability by environment for a full comparison.

Reporting issues and contact

Bug reports / feature requests: GitHub Issues
Email: info@dhisys.co.uk

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

pat2echo

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.6

May 30, 2026

0.1.5

May 30, 2026

0.1.4

May 23, 2026

0.1.3

May 23, 2026

0.1.2

May 23, 2026

0.1.0

May 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mrm_trace-0.1.6.tar.gz (50.8 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mrm_trace-0.1.6-py3-none-any.whl (63.0 kB view details)

Uploaded May 30, 2026 Python 3

File details

Details for the file mrm_trace-0.1.6.tar.gz.

File metadata

Download URL: mrm_trace-0.1.6.tar.gz
Upload date: May 30, 2026
Size: 50.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mrm_trace-0.1.6.tar.gz
Algorithm	Hash digest
SHA256	`2dc9552bfb1f9f1ab312dbe008976474d8bec04cce77a8773d3b34cbc36ffc33`
MD5	`d579bbba63c63459e2e97b75a3cc3446`
BLAKE2b-256	`2ba06f8f50e34240a6d02b5e4450bb0a0f0f5085936efa8365c8b50fb65a1b77`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mrm_trace-0.1.6.tar.gz:

Publisher: publish.yml on DhiSys-AI/MRM-Trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mrm_trace-0.1.6.tar.gz
- Subject digest: 2dc9552bfb1f9f1ab312dbe008976474d8bec04cce77a8773d3b34cbc36ffc33
- Sigstore transparency entry: 1674689017
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: DhiSys-AI/MRM-Trace@d6540234d80b8b952b3ab42f1a9b6b57c32e20dd
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/DhiSys-AI
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6540234d80b8b952b3ab42f1a9b6b57c32e20dd
- Trigger Event: release

File details

Details for the file mrm_trace-0.1.6-py3-none-any.whl.

File metadata

Download URL: mrm_trace-0.1.6-py3-none-any.whl
Upload date: May 30, 2026
Size: 63.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for mrm_trace-0.1.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8e212243ed9942f6d68efe40a90330037a75d86c87658353958d4083330bc8bc`
MD5	`c370a968c446fb2351e7f9ebb6909470`
BLAKE2b-256	`eb74653c7c036d23799861f93b56fcb398f76cc28d454976195c658187f99d9a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mrm_trace-0.1.6-py3-none-any.whl:

Publisher: publish.yml on DhiSys-AI/MRM-Trace

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mrm_trace-0.1.6-py3-none-any.whl
- Subject digest: 8e212243ed9942f6d68efe40a90330037a75d86c87658353958d4083330bc8bc
- Sigstore transparency entry: 1674689057
- Sigstore integration time: May 30, 2026
Source repository:
- Permalink: DhiSys-AI/MRM-Trace@d6540234d80b8b952b3ab42f1a9b6b57c32e20dd
- Branch / Tag: refs/tags/v0.1.6
- Owner: https://github.com/DhiSys-AI
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6540234d80b8b952b3ab42f1a9b6b57c32e20dd
- Trigger Event: release

mrm-trace 0.1.6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

mrm-trace

Install from PyPI

Requirements

Collector capability by environment

Install

Quick start

Live demo scripts

Setup

TinyLlama 1.1B (llama-cpp-python + GGUF)

Qwen2.5-0.5B-Instruct (transformers, no GGUF needed)

perf mem (bare-metal Linux only)

Script options

Notebooks

Running tests

Output layout

Run validity classification

Architecture

MRM suitability labels

Schema versioning

Python API

Collector hierarchy

Reporting issues and contact

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance