Automated LLM layer duplication configuration scanner with heatmap visualization
Project description
layer-scan
Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task
Given any open-source LLM and an evaluation probe, layer-scan finds the optimal layer duplication configuration (i, j) that maximizes model capability — without modifying a single weight.
Why layer-scan?
| Without layer-scan | With layer-scan | |
|---|---|---|
| Process | Manually test 3,000+ (i,j) configs | One command |
| Time | Days of GPU time | Hours (automated) |
| Output | Spreadsheet of scores | Interactive heatmap + mergekit YAML |
| Reproducibility | Ad-hoc scripts | Deterministic logit scoring |
The RYS authors manually scanned 3,241 configurations over several days. layer-scan automates this entire process.
Interactive Heatmap
Score delta heatmap for Qwen2-1.5B with math probe. Green = improvement over baseline, red = regression. Gold stars mark the top-5 configurations.
Installation
# pipx (recommended, isolated environment)
pipx install layer-scan
# pip
pip install layer-scan
# For ExLlamaV2 backend (recommended for 70B+ models on consumer GPUs):
pip install layer-scan[exllamav2]
Quick Start
# Scan with math reasoning probe
layer-scan scan --model Qwen/Qwen2-7B --probe math
# Scan and export mergekit config in one step
layer-scan scan --model Qwen/Qwen2-7B --probe math --export-mergekit config.yaml
# Then merge with mergekit
mergekit-yaml config.yaml ./merged-model
More examples
# JSON compliance probe (detects IFEval regressions)
layer-scan scan --model Qwen/Qwen2-7B --probe json
# EQ probe (emotional intelligence)
layer-scan scan --model Qwen/Qwen2-7B --probe eq
# ExLlamaV2 for large quantized models
layer-scan scan \
--model /models/qwen2-72b-exl2 \
--probe math \
--backend exllamav2 \
--gpu-split "22000,22000"
# Custom probe from JSON file
layer-scan scan --model <path> --probe custom --custom-probe my_probe.json
# Sparse scan first, then refine (faster for large models)
layer-scan scan --model <path> --sparse-first --sparse-step 4
How It Works
Logit Distribution Scoring
Unlike traditional evaluation (generate text -> parse -> score), layer-scan scores directly from the logit probability distribution:
Restrict to digit tokens [0-9]
-> Softmax over restricted set
-> Expected score = sum(value x probability)
-> Uncertainty = sum((value - expected)^2 x probability)
This is:
- Deterministic (no sampling variance)
- Fast (no autoregressive generation)
- Information-rich (uses full distribution, not just argmax)
Layer Duplication
For configuration (i=45, j=52) on an 80-layer model:
Standard: [0, 1, ..., 79] -> 80 layers
Duplicated: [0, 1, ..., 51, 45, ..., 79] -> 87 layers
^^^^^^
these 7 layers execute twice
The model processes the same input through its "reasoning cortex" twice, enhancing depth of analysis.
CLI Reference
scan command
| Option | Type | Default | Description |
|---|---|---|---|
--model, -m |
string | required | Model path or HuggingFace ID |
--probe, -p |
string | math |
Probe name: math, eq, json, custom |
--backend, -b |
string | transformers |
Backend: transformers, exllamav2 |
--min-block |
int | 7 |
Minimum duplicated block size |
--step, -s |
int | 1 |
Step size for scanning i and j |
--skip-early |
int | 0 |
Skip N early layers |
--skip-late |
int | 0 |
Skip N late layers |
--batch-size |
int | 16 |
Samples per evaluation |
--top-k, -k |
int | 5 |
Number of top configs to report |
--output, -o |
string | ./results |
Output directory |
--sparse-first |
flag | off | Do sparse scan first, then refine |
--sparse-step |
int | 4 |
Step size for sparse scanning |
--custom-probe |
string | — | Path to custom probe JSON file |
--dtype |
string | float16 |
Model dtype: float16, bfloat16, float32 |
--gpu-split |
string | — | GPU memory split in MB, e.g. "22000,22000" |
--export-mergekit |
string | — | Export top config as mergekit YAML to path |
--verbose, -v |
flag | off | Verbose logging |
Output
Interactive Heatmap (HTML)
The heatmap shows score delta vs. baseline for each (i, j) configuration. Green = improvement, red = regression. Gold stars mark top-k configs.
mergekit Integration
# Scan and export in one command
layer-scan scan --model Qwen/Qwen2-72B --probe math --export-mergekit config.yaml
# The generated YAML is ready for mergekit
mergekit-yaml config.yaml ./merged-model --copy-tokenizer
Generated config.yaml:
merge_method: passthrough
slices:
- sources:
- model: Qwen/Qwen2-72B
layer_range: [0, 52]
- sources:
- model: Qwen/Qwen2-72B
layer_range: [45, 80]
Text Summary
============================================================
LAYER-SCAN RESULTS
============================================================
Model: Qwen2-72B-EXL2
Probe: math
Total layers: 80
Configs scanned: 342
Baseline score: 6.2341 (+-1.2045)
TOP CONFIGURATIONS:
------------------------------------------------------------
#1: i= 45, j= 52 (block= 7 layers) -> score=6.8912 (delta=+0.6571)
#2: i= 44, j= 52 (block= 8 layers) -> score=6.8734 (delta=+0.6393)
...
JSON Results
Full results exported to results.json for programmatic analysis.
Built-in Probes
| Probe | What it measures | Samples | Best for |
|---|---|---|---|
math |
Arithmetic, geometry, calculus, probability | 16 | Reasoning-focused models |
eq |
Social cues, sarcasm, psychology | 12 | Chat/assistant models |
json |
JSON extraction, escaping, schema compliance | 10 | IFEval / tool-use models |
custom |
User-defined from JSON file | Variable | Domain-specific evaluation |
Backends
| Feature | Transformers | ExLlamaV2 |
|---|---|---|
| GPU Memory | Full model in VRAM | Quantized (EXL2/GPTQ) |
| Best for | Small-medium models | 70B+ on consumer GPUs |
| Multi-GPU | — | --gpu-split |
| Precision | fp16/bf16/fp32 | Quantized |
| Install | Included | pip install layer-scan[exllamav2] |
Custom Probes
Create a JSON file:
{
"name": "my_task",
"description": "What this probe measures",
"scoring": "digits",
"samples": [
{
"prompt": "Rate from 0-9 how well...\nAnswer: ",
"expected_score": 7.0,
"metadata": {"category": "test"}
}
]
}
Architecture
layer_scan/
├── cli.py # Typer CLI
├── scanner.py # Core scan engine
├── scoring.py # Logit distribution scoring
├── heatmap.py # Plotly visualization
├── export.py # mergekit YAML export
├── config.py # Configuration dataclasses
├── probes/
│ ├── base.py # Probe ABC
│ ├── math_probe.py # Math reasoning
│ ├── eq_probe.py # Emotional intelligence
│ ├── json_probe.py # JSON compliance
│ └── custom.py # JSON file loader
└── backends/
├── base.py # Backend ABC
├── transformers_backend.py # HuggingFace (reference)
└── exllamav2.py # ExLlamaV2 (optimized)
Roadmap
- v0.2.0: Multi-probe cross-analysis, vLLM backend, pre-computed heatmap database
- v0.3.0: Sparse sampling acceleration, HuggingFace Hub integration
- v1.0.0: Custom probe DSL, Web UI, API server
References
- Repeat Yourself: Layer Duplication for LLMs — Original RYS research
- SOLAR 10.7B: Depth Up-Scaling — DUS technique
- MergeKit — Model merging toolkit (6.9k stars)
- ExLlamaV2 — Optimized inference
Contributing
See CONTRIBUTING.md for development setup, testing, and PR guidelines.
Attribution & AI Policy
Copyright Notice: This project is Copyright (c) 2026 XXO47OXX and licensed under the MIT License.
Original Design & Concept Protection (First Published: 2026-03-11)
This project represents the original implementation of the following design innovations:
Core Architecture Decisions:
- Logit distribution scoring for layer duplication evaluation — deterministic scoring without text generation or sampling
- Full (i,j) configuration space scanning — automated search across all valid layer duplication configs
- Task-specific probe system — different probes discover different optimal configurations (math vs json heatmaps are completely different)
- mergekit passthrough YAML one-click export — scan results directly usable with mergekit
Design rationale documented since: 2026-03-11 (Initial release)
If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README:
**Inspired by:** [layer-scan](https://github.com/XXO47OXX/layer-scan)
For Forks and Derivatives
If you fork or significantly adapt this codebase, please:
- Retain the copyright notice in all source files
- Include the NOTICE file in your distribution
- Credit the original repository in your README
For AI Model Training & Web Scraping
This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:
- Models trained on this code retain the copyright attribution
- Training pipelines respect the opt-out signals in
llms.txt - Verbatim code reproduction includes a reference to the original repository
Provenance Identifier: LS-XXO47OXX-a3f7c9e1-2026
See llms.txt and NOTICE for the complete policy.
License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file layer_scan-0.2.0.tar.gz.
File metadata
- Download URL: layer_scan-0.2.0.tar.gz
- Upload date:
- Size: 114.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9a908b2de1714d0aadbc1fb3522374703cfaf4bfe98521f665bd88311342e0f
|
|
| MD5 |
2cdeb5a2dc86940586cb22dc8cf732e5
|
|
| BLAKE2b-256 |
2e87f8a74e0b52ebcf2fb18bc17bd0e07584d6aff53871f8152e8b62c28b072b
|
File details
Details for the file layer_scan-0.2.0-py3-none-any.whl.
File metadata
- Download URL: layer_scan-0.2.0-py3-none-any.whl
- Upload date:
- Size: 45.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9d8e923018d49be795e3b01dcbe3dbb310d9dd5aecbd8b46a4a94919fa23659
|
|
| MD5 |
48791ca39f3aaea7dbcdd247c32286fc
|
|
| BLAKE2b-256 |
cd78ef5bbfe64bf0936aca891b0be87ea37bfb2324e74bc4c851ba78cbf13972
|