Skip to main content

Automated LLM layer duplication configuration scanner with heatmap visualization

Project description

layer-scan

Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task

PyPI Python License CI Coverage

Python PyTorch HuggingFace Plotly ExLlamaV2


Given any open-source LLM and an evaluation probe, layer-scan finds the optimal layer duplication configuration (i, j) that maximizes model capability — without modifying a single weight.

Why layer-scan?

Without layer-scan With layer-scan
Process Manually test 3,000+ (i,j) configs One command
Time Days of GPU time Hours (automated)
Output Spreadsheet of scores Interactive heatmap + mergekit YAML
Reproducibility Ad-hoc scripts Deterministic logit scoring

The RYS authors manually scanned 3,241 configurations over several days. layer-scan automates this entire process.

Interactive Heatmap

layer-scan heatmap — Qwen2-1.5B / math probe

Score delta heatmap for Qwen2-1.5B with math probe. Green = improvement over baseline, red = regression. Gold stars mark the top-5 configurations.

Features

  • Full (i,j) Configuration Scanning — automated search across all valid layer duplication configs
  • Logit Distribution Scoring — deterministic scoring without text generation, with coverage diagnostics
  • Multi-probe Cross-analysis — scan multiple probes at once, find Pareto-optimal configs
  • Cross-tool Annotation — overlay neuro-scan layer labels on heatmaps
  • Scoring Diagnostics — coverage field measures how much probability mass falls on scored tokens
  • Sparse-then-Dense Scanning — two-phase strategy for faster exploration of large models
  • mergekit Integration — one-click export of scan results as mergekit-compatible YAML
  • Interactive HTML Heatmaps — Plotly-powered visualizations with hover details
  • Pre-computed Lookup — fetch community scan results from HuggingFace Hub (no GPU needed)

Installation

# pipx (recommended, isolated environment)
pipx install layer-scan

# pip
pip install layer-scan

# For ExLlamaV2 backend (recommended for 70B+ models on consumer GPUs):
pip install layer-scan[exllamav2]

# For pre-computed lookup (no GPU required):
pip install layer-scan[lookup]

Quick Start

# Scan with math reasoning probe
layer-scan scan --model Qwen/Qwen2-7B --probe math

# Scan and export mergekit config in one step
layer-scan scan --model Qwen/Qwen2-7B --probe math --export-mergekit config.yaml

# Multi-probe cross-analysis (find Pareto-optimal configs)
layer-scan multi-probe --model Qwen/Qwen2-7B --probes "math,eq,json"

# Cross-tool annotation (overlay neuro-scan labels on heatmap)
layer-scan annotate --results results.json --neuro-report neuro_report.json

# Look up pre-computed results (no GPU needed)
layer-scan lookup --model Qwen/Qwen2-7B --probe math

# Then merge with mergekit
mergekit-yaml config.yaml ./merged-model

More examples

# JSON compliance probe (detects IFEval regressions)
layer-scan scan --model Qwen/Qwen2-7B --probe json

# EQ probe (emotional intelligence)
layer-scan scan --model Qwen/Qwen2-7B --probe eq

# ExLlamaV2 for large quantized models
layer-scan scan \
  --model /models/qwen2-72b-exl2 \
  --probe math \
  --backend exllamav2 \
  --gpu-split "22000,22000"

# Custom probe from JSON file
layer-scan scan --model <path> --probe custom --custom-probe my_probe.json

# Sparse scan first, then refine (faster for large models)
layer-scan scan --model <path> --sparse-first --sparse-step 4

Commands

Command Description
scan Scan (i,j) configs with a single probe
multi-probe Cross-probe scan, find Pareto-optimal configs
annotate Overlay neuro-scan labels on layer-scan heatmap
lookup Fetch pre-computed results from HuggingFace Hub
probes List available evaluation probes
version Show version

How It Works

Logit Distribution Scoring

Unlike traditional evaluation (generate text -> parse -> score), layer-scan scores directly from the logit probability distribution:

Restrict to digit tokens [0-9]
-> Softmax over restricted set
-> Expected score = sum(value x probability)
-> Uncertainty = sum((value - expected)^2 x probability)

This is:

  • Deterministic (no sampling variance)
  • Fast (no autoregressive generation)
  • Information-rich (uses full distribution, not just argmax)

Scoring Diagnostics

Each score includes a coverage field — the fraction of probability mass on scored digit tokens in the full vocabulary:

Coverage Interpretation
> 0.5 Score is reliable — the model is genuinely choosing among digits
0.1 - 0.5 Use with caution — the model partially intends non-digit output
< 0.1 Score is noise — the model wants to output non-digit tokens

Coverage is reported in scan summaries and included in results.json output.

Layer Duplication

For configuration (i=45, j=52) on an 80-layer model:

Standard:    [0, 1, ..., 79]              -> 80 layers
Duplicated:  [0, 1, ..., 51, 45, ..., 79] -> 87 layers
                           ^^^^^^
                     these 7 layers execute twice

The model processes the same input through its "reasoning cortex" twice, enhancing depth of analysis.

Multi-probe Analysis

The multi-probe command scans multiple probes in a single session and identifies Pareto-optimal configurations — configs that are not dominated by any other config across all probes.

layer-scan multi-probe --model Qwen/Qwen2-7B --probes "math,eq,json"

Pareto Frontier

A config is Pareto-optimal if no other config scores better on all probes simultaneously. This finds balanced configs that improve the model broadly rather than overfitting to a single task.

Output

The command produces multi_probe.json containing:

  • pareto_configs — all Pareto-optimal (i,j) configs with per-probe scores
  • per_probe_best — the single best config for each probe independently
  • normalized_score — a balanced score for ranking Pareto configs

Cross-tool Annotation

The annotate command overlays neuro-scan layer labels onto layer-scan heatmaps, creating a unified visualization.

Workflow

# Step 1: Scan layer duplication configs
layer-scan scan --model ./my-model --probe math

# Step 2: Run neuroanatomy analysis
neuro-scan map --model ./my-model --probe math

# Step 3: Annotate — overlay neuro-scan labels on heatmap
layer-scan annotate \
  --results ./results/results.json \
  --neuro-report ./results/report.json \
  --output annotated_heatmap.html

The annotated heatmap shows:

  • neuro-scan layer labels (reasoning/syntax/output) as color bands
  • How many "reasoning layers" each top config duplicates
  • Explanation text: "Config (i=12, j=20) is optimal because it duplicates layers 14, 16, 18 (all reasoning layers)"

Pre-computed Lookup

The lookup command fetches community-contributed scan results from HuggingFace Hub — no GPU required.

# Fetch pre-computed results
layer-scan lookup --model Qwen/Qwen2-7B --probe math

# Download full results.json locally
layer-scan lookup --model Qwen/Qwen2-7B --probe math --download

Requires the lookup extra: pip install layer-scan[lookup]

Results are sourced from the XXO47OXX/layer-scan-results HuggingFace dataset. Community contributions welcome.

CLI Reference

scan command

Option Type Default Description
--model, -m string required Model path or HuggingFace ID
--probe, -p string math Probe name: math, eq, json, custom
--backend, -b string transformers Backend: transformers, exllamav2
--min-block int 7 Minimum duplicated block size
--step, -s int 1 Step size for scanning i and j
--skip-early int 0 Skip N early layers
--skip-late int 0 Skip N late layers
--batch-size int 16 Samples per evaluation
--top-k, -k int 5 Number of top configs to report
--output, -o string ./results Output directory
--sparse-first flag off Do sparse scan first, then refine
--sparse-step int 4 Step size for sparse scanning
--custom-probe string Path to custom probe JSON file
--dtype string float16 Model dtype: float16, bfloat16, float32
--gpu-split string GPU memory split in MB, e.g. "22000,22000"
--export-mergekit string Export top config as mergekit YAML to path
--verbose, -v flag off Verbose logging

Output

Interactive Heatmap (HTML)

The heatmap shows score delta vs. baseline for each (i, j) configuration. Green = improvement, red = regression. Gold stars mark top-k configs.

mergekit Integration

# Scan and export in one command
layer-scan scan --model Qwen/Qwen2-72B --probe math --export-mergekit config.yaml

# The generated YAML is ready for mergekit
mergekit-yaml config.yaml ./merged-model --copy-tokenizer

Generated config.yaml:

merge_method: passthrough
slices:
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [0, 52]
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [45, 80]

Text Summary

============================================================
LAYER-SCAN RESULTS
============================================================
Model: Qwen2-72B-EXL2
Probe: math
Total layers: 80
Configs scanned: 342

Baseline score: 6.2341 (+-1.2045)

TOP CONFIGURATIONS:
------------------------------------------------------------
  #1: i= 45, j= 52 (block= 7 layers) -> score=6.8912 (delta=+0.6571)
  #2: i= 44, j= 52 (block= 8 layers) -> score=6.8734 (delta=+0.6393)
  ...

JSON Results

Full results exported to results.json for programmatic analysis.

Built-in Probes

Probe What it measures Samples Best for
math Arithmetic, geometry, calculus, probability 16 Reasoning-focused models
eq Social cues, sarcasm, psychology 12 Chat/assistant models
json JSON extraction, escaping, schema compliance 10 IFEval / tool-use models
custom User-defined from JSON file Variable Domain-specific evaluation

Backends

Feature Transformers ExLlamaV2
GPU Memory Full model in VRAM Quantized (EXL2/GPTQ)
Best for Small-medium models 70B+ on consumer GPUs
Multi-GPU --gpu-split
Precision fp16/bf16/fp32 Quantized
Install Included pip install layer-scan[exllamav2]

Custom Probes

Create a JSON file:

{
  "name": "my_task",
  "description": "What this probe measures",
  "scoring": "digits",
  "samples": [
    {
      "prompt": "Rate from 0-9 how well...\nAnswer: ",
      "expected_score": 7.0,
      "metadata": {"category": "test"}
    }
  ]
}

Architecture

layer_scan/
├── cli.py              # Typer CLI (scan, multi-probe, annotate, lookup, probes, version)
├── scanner.py          # Core scan engine
├── scoring.py          # Logit distribution scoring with coverage diagnostics
├── heatmap.py          # Plotly visualization
├── export.py           # mergekit YAML export
├── config.py           # Configuration dataclasses
├── multi_probe.py      # Multi-probe Pareto analysis
├── annotate.py         # Cross-tool annotation with neuro-scan
├── lookup.py           # Pre-computed results from HF Hub
├── probes/
│   ├── base.py         # Probe ABC
│   ├── math_probe.py   # Math reasoning
│   ├── eq_probe.py     # Emotional intelligence
│   ├── json_probe.py   # JSON compliance
│   └── custom.py       # JSON file loader
└── backends/
    ├── base.py         # Backend ABC
    ├── transformers_backend.py  # HuggingFace (reference)
    └── exllamav2.py    # ExLlamaV2 (optimized)

Roadmap

  • v0.1.0: Core scanning, heatmaps, mergekit export
  • v0.2.0: Multi-probe Pareto analysis, cross-tool annotation, scoring diagnostics
  • v0.3.0: Pre-computed heatmap database (lookup command), vLLM backend
  • v1.0.0: Custom probe DSL, Web UI, API server

References

Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

Attribution & AI Policy

Copyright Notice: This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.

Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the original implementation of the following design innovations:

Core Architecture Decisions:

  • Logit distribution scoring for layer duplication evaluation — deterministic scoring without text generation or sampling
  • Full (i,j) configuration space scanning — automated search across all valid layer duplication configs
  • Task-specific probe system — different probes discover different optimal configurations (math vs json heatmaps are completely different)
  • mergekit passthrough YAML one-click export — scan results directly usable with mergekit

Design rationale documented since: 2026-03-11 (Initial release)

If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README:

**Inspired by:** [layer-scan](https://github.com/XXO47OXX/layer-scan)

For Forks and Derivatives

If you fork or significantly adapt this codebase, please:

  • Retain the copyright notice in all source files
  • Include the NOTICE file in your distribution
  • Credit the original repository in your README

For AI Model Training & Web Scraping

This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:

  • Models trained on this code retain the copyright attribution
  • Training pipelines respect the opt-out signals in llms.txt
  • Verbatim code reproduction includes a reference to the original repository

Provenance Identifier: LS-XXO47OXX-a3f7c9e1-2026

See llms.txt and NOTICE for the complete policy.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layer_scan-0.2.1.tar.gz (118.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

layer_scan-0.2.1-py3-none-any.whl (48.8 kB view details)

Uploaded Python 3

File details

Details for the file layer_scan-0.2.1.tar.gz.

File metadata

  • Download URL: layer_scan-0.2.1.tar.gz
  • Upload date:
  • Size: 118.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for layer_scan-0.2.1.tar.gz
Algorithm Hash digest
SHA256 83862e3c59c82fcc1aee7ca626926b84f45bdfee1cd6b286d00ddd7fa7d32872
MD5 525f1bf6faf5a7ed8a5cd98266b5e81a
BLAKE2b-256 4b8fc67e7b70f3114c14bb2cfd9db553ee7008d83fd593d6d24805bcedead1df

See more details on using hashes here.

File details

Details for the file layer_scan-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: layer_scan-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for layer_scan-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 258af727d340667c9dc641401d2738bb3f6882114ac1d4ab4e49755afcac35ed
MD5 362edafb973843d7bb8b6b9c3a41bea5
BLAKE2b-256 59c0900066a26f41aa54a3670f1219a6b5170c09d719618055604cee3b96c46e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page