Post-training diagnostics for LLMs — what did fine-tuning actually do to your model?

These details have not been verified by PyPI

Project links

Project description

Afterburn

Post-training diagnostics for LLMs. Weight diffs, behavioral analysis, and reward hacking detection — before you deploy.

Most evaluation tools tell you benchmark scores went up or down. Afterburn tells you why — by comparing two model checkpoints (base + post-trained) at the weight level, the behavioural level, and checking for reward hacking patterns.

No other open-source tool combines all three.

Quick Start

pip install afterburn

afterburn diagnose \
  --base Qwen/Qwen2.5-0.5B \
  --trained Qwen/Qwen2.5-0.5B-Instruct \
  --method sft \
  -o report.html

from afterburn import Diagnoser

report = Diagnoser(
    base_model="Qwen/Qwen2.5-0.5B",
    trained_model="Qwen/Qwen2.5-0.5B-Instruct",
    method="sft",
).run()

print(report.summary)
print(f"Reward hack risk: {report.hack_score:.0f}/100")
report.save("report.html")

What It Does

1. Weight Diff Analysis

Compares model weights layer-by-layer. Memory-efficient via safetensors memory mapping (~128MB peak per layer for 8B models).

Metric	What It Measures
L2 / Cosine / Frobenius	Magnitude and direction of weight changes
SVD decomposition	Effective rank, concentration ratio, stable rank of the diff
Spectral alpha	Power-law exponent of eigenvalue spectrum (2-4 = healthy)
Marchenko-Pastur law	Compares eigenvalues to random matrix theory — spikes = learned structure
Behavioral vectors	Principal directions of change via SVD, cross-layer coherence
Attention head importance	Per-head importance delta before vs after training
LayerNorm shift	Gamma/beta parameter drift detection
Embedding drift	Token embedding movement, most-drifted tokens
LoRA analysis	Adapter weight decomposition and impact (auto-detected)

2. Behavioral Shift Detection

Runs the same prompts through both models, compares outputs statistically.

Length distribution — Mann-Whitney U test, Cohen's d, skewness, kurtosis, percentiles
Reasoning strategy — Classification (direct, step-by-step, code-assisted, CoT, tool use) with NLI tiebreaker
Strategy shift — Detects if training collapsed diverse strategies into one
Format compliance — Code blocks, LaTeX, markdown, tables, thinking tags (Shannon entropy)
Chain-of-thought — Step counting, depth, self-correction rate, verification patterns
Diversity — EAD (Expectation-Adjusted Distinct n-grams), optional SBERT semantic diversity
Token divergence — Jensen-Shannon Divergence on token probability distributions
Calibration — Expected Calibration Error (ECE), reliability diagrams

3. Reward Hacking Detection

Detects failure modes from RLHF/DPO/GRPO training. Composite risk score 0-100.

Detector	What It Catches
Length bias	Outputs got longer without quality gains (Cohen's d)
Format gaming	Model exploits format-based reward signals (ROUGE-L correctness correlation)
Strategy collapse	Model converges on one strategy, losing diversity (Shannon entropy)
Sycophancy	Model agrees more post-training, even with false claims

Sycophancy detection uses three methods:

Regex-based agreement/pushback rate comparison
NLI-enhanced semantic agreement detection (cross-encoder/nli-deberta-v3-small)
40 adversarial consistency probes across math, science, history, and coding — neutral vs leading prompt pairs that test if the model changes factual answers under pressure

4. Reports

Interactive HTML with Plotly visualizations
JSON structured output for pipelines
Markdown for documentation
PDF (optional dependency)
Executive summary + actionable recommendations

Installation

# From PyPI
pip install afterburn

# From source
git clone https://github.com/code-mohanprakash/afterburn.git
cd afterburn
pip install -e ".[dev]"

# Optional: NLI-enhanced analysis
pip install afterburn[nli]

# Optional: PDF export
pip install afterburn[pdf]

# Optional: Semantic diversity (SBERT)
pip install afterburn[semantic]

Requirements: Python 3.10+, PyTorch 2.0+. GPU recommended but not required (CUDA, MPS, CPU).

CLI

# Full diagnostic
afterburn diagnose --base <model> --trained <model> -o report.html

# Individual analyses
afterburn weight-diff --base <model> --trained <model> -o weights.json
afterburn behaviour --base <model> --trained <model> -o behaviour.json
afterburn hack-check --base <model> --trained <model> -o hacking.json

Python API

from afterburn import Diagnoser

diag = Diagnoser(
    base_model="meta-llama/Llama-3.1-8B",
    trained_model="my-org/Llama-3.1-8B-RLVR",
    method="rlvr",
)

# Full analysis
report = diag.run()

# Or individual modules
weight_diff = diag.run_weight_diff()
behaviour = diag.run_behaviour()
hack_check = diag.run_hack_check()

# Inspect results
for layer in weight_diff.top_changed_layers:
    print(f"{layer.layer_name}: relative_change={layer.relative_change:.4f}")
    if layer.mp_num_spikes is not None:
        print(f"  MP spikes: {layer.mp_num_spikes} (bulk: {layer.mp_bulk_fraction:.1%})")

print(f"Direction coherence: {weight_diff.direction_coherence:.3f}")

Configuration

Optional .afterburn.yaml:

device: auto
behaviour:
  suites: [math, code, reasoning, safety]
  max_new_tokens: 512
  batch_size: 4
reward_hack:
  weights:
    length_bias: 0.25
    format_gaming: 0.30
    strategy_collapse: 0.20
    sycophancy: 0.25

How It Works

Base Model ──┐
             ├── Weight Diff (safetensors, one layer at a time)
Trained Model┘           │
                         ├── Diagnostic Report
Base Model ──┐           │   (HTML / JSON / MD / PDF)
             ├── Prompt Runner (one model at a time)
Trained Model┘           │
                         ├── Behaviour Analysis (statistical comparison)
                         │
                         └── Reward Hack Detection (40 adversarial probes)

Weight diff loads both checkpoints via memory-mapped safetensors and computes per-layer metrics including SVD, spectral analysis, and Marchenko-Pastur law fitting
Prompt runner generates outputs from both models on standardized prompt suites (loads one model at a time to halve memory)
Behaviour analyser compares output distributions with statistical tests (Mann-Whitney U, Cohen's d, JSD, EAD)
Reward hack detector runs 4 sub-detectors + 40 adversarial consistency probes with NLI-enhanced scoring
Report generator compiles everything into a human-readable diagnostic with Plotly visualizations

Project Structure

src/afterburn/
├── cli/            # Click CLI commands
├── loading/        # Model loading, safetensors, LoRA adapter detection
├── weight_diff/    # L2, cosine, SVD, spectral alpha, MP law, behavioral vectors
├── behaviour/      # Length, format, strategy, CoT, calibration, diversity, JSD
├── reward_hack/    # Length bias, format gaming, strategy collapse, sycophancy, probes
├── prompts/        # Prompt suites + inference runner
├── report/         # HTML/JSON/MD/PDF generation + Plotly visualizations
├── nli.py          # Shared NLI model (cross-encoder/nli-deberta-v3-small)
├── diagnoser.py    # Top-level orchestrator
└── types.py        # 30+ shared dataclasses and enums

Testing

pytest tests/ -v                    # 678 tests
pytest tests/ --cov=afterburn       # with coverage
ruff check src/ tests/              # linting
mypy src/afterburn/                 # type checking

Contributing

git clone https://github.com/code-mohanprakash/afterburn.git
cd afterburn
pip install -e ".[dev]"
pytest tests/

See docs/contributing.md for architecture details and contribution guidelines.

Why Afterburn?

Existing tools either analyze weights (WeightWatcher) or evaluate outputs (lm-eval-harness, Giskard, DeepEval) — but none connect weight changes to behavioral shifts to reward hacking patterns in a single workflow.

Reward hacking is a validated problem in frontier models (METR, Anthropic). Afterburn is the open-source tool for detecting it.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.6.3

Feb 16, 2026

0.6.2

Feb 16, 2026

0.6.1

Feb 16, 2026

This version

0.6.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

afterburn-0.6.0.tar.gz (172.9 kB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

afterburn-0.6.0-py3-none-any.whl (125.7 kB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file afterburn-0.6.0.tar.gz.

File metadata

Download URL: afterburn-0.6.0.tar.gz
Upload date: Feb 16, 2026
Size: 172.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for afterburn-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`3bce94e88c0c2b03daa452abac279df97dbee144bfee86de276f9b66928b6392`
MD5	`0752a6d40bc6e55822160d468b55fd75`
BLAKE2b-256	`74c32ca082afd18eed7e54539ad5f2665bd9b60a52e3efa6fbd494a2815dbf8d`

See more details on using hashes here.

File details

Details for the file afterburn-0.6.0-py3-none-any.whl.

File metadata

Download URL: afterburn-0.6.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 125.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for afterburn-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d8a2302882d25c583aba226892eb6ca1fc02343cb462d627e762c39628d4029c`
MD5	`c5b66867ffeeff3c0e4518e78159e0e4`
BLAKE2b-256	`96fcd3103a9925688746fc5a1dc97fadc7b1e10b673ac0ecf8a3b36625a7a11d`

See more details on using hashes here.

afterburn 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Afterburn

Quick Start

What It Does

1. Weight Diff Analysis

2. Behavioral Shift Detection

3. Reward Hacking Detection

4. Reports

Installation

CLI

Python API

Configuration

How It Works

Project Structure

Testing

Contributing

Why Afterburn?

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes