Automated LLM layer duplication configuration scanner with heatmap visualization

These details have not been verified by PyPI

Project links

Project description

layer-scan

Automated LLM layer duplication config scanner — find the optimal (i,j) for any model + task

Given any open-source LLM and an evaluation probe, layer-scan finds the optimal layer duplication configuration (i, j) that maximizes model capability — without modifying a single weight.

Why layer-scan?

	Without layer-scan	With layer-scan
Process	Manually test 3,000+ (i,j) configs	One command
Time	Days of GPU time	Hours (automated)
Output	Spreadsheet of scores	Interactive heatmap + mergekit YAML
Reproducibility	Ad-hoc scripts	Deterministic logit scoring

The RYS authors manually scanned 3,241 configurations over several days. layer-scan automates this entire process.

Interactive Heatmap

layer-scan heatmap — Qwen2-1.5B / math probe

Score delta heatmap for Qwen2-1.5B with math probe. Green = improvement over baseline, red = regression. Gold stars mark the top-5 configurations.

Installation

# pipx (recommended, isolated environment)
pipx install layer-scan

# pip
pip install layer-scan

# For ExLlamaV2 backend (recommended for 70B+ models on consumer GPUs):
pip install layer-scan[exllamav2]

Quick Start

# Scan with math reasoning probe
layer-scan scan --model Qwen/Qwen2-7B --probe math

# Scan and export mergekit config in one step
layer-scan scan --model Qwen/Qwen2-7B --probe math --export-mergekit config.yaml

# Then merge with mergekit
mergekit-yaml config.yaml ./merged-model

More examples

# JSON compliance probe (detects IFEval regressions)
layer-scan scan --model Qwen/Qwen2-7B --probe json

# EQ probe (emotional intelligence)
layer-scan scan --model Qwen/Qwen2-7B --probe eq

# ExLlamaV2 for large quantized models
layer-scan scan \
  --model /models/qwen2-72b-exl2 \
  --probe math \
  --backend exllamav2 \
  --gpu-split "22000,22000"

# Custom probe from JSON file
layer-scan scan --model <path> --probe custom --custom-probe my_probe.json

# Sparse scan first, then refine (faster for large models)
layer-scan scan --model <path> --sparse-first --sparse-step 4

How It Works

Logit Distribution Scoring

Unlike traditional evaluation (generate text -> parse -> score), layer-scan scores directly from the logit probability distribution:

Restrict to digit tokens [0-9]
-> Softmax over restricted set
-> Expected score = sum(value x probability)
-> Uncertainty = sum((value - expected)^2 x probability)

This is:

Deterministic (no sampling variance)
Fast (no autoregressive generation)
Information-rich (uses full distribution, not just argmax)

Layer Duplication

For configuration (i=45, j=52) on an 80-layer model:

Standard:    [0, 1, ..., 79]              -> 80 layers
Duplicated:  [0, 1, ..., 51, 45, ..., 79] -> 87 layers
                           ^^^^^^
                     these 7 layers execute twice

The model processes the same input through its "reasoning cortex" twice, enhancing depth of analysis.

CLI Reference

`scan` command

Option	Type	Default	Description
`--model`, `-m`	string	required	Model path or HuggingFace ID
`--probe`, `-p`	string	`math`	Probe name: `math`, `eq`, `json`, `custom`
`--backend`, `-b`	string	`transformers`	Backend: `transformers`, `exllamav2`
`--min-block`	int	`7`	Minimum duplicated block size
`--step`, `-s`	int	`1`	Step size for scanning i and j
`--skip-early`	int	`0`	Skip N early layers
`--skip-late`	int	`0`	Skip N late layers
`--batch-size`	int	`16`	Samples per evaluation
`--top-k`, `-k`	int	`5`	Number of top configs to report
`--output`, `-o`	string	`./results`	Output directory
`--sparse-first`	flag	off	Do sparse scan first, then refine
`--sparse-step`	int	`4`	Step size for sparse scanning
`--custom-probe`	string	—	Path to custom probe JSON file
`--dtype`	string	`float16`	Model dtype: `float16`, `bfloat16`, `float32`
`--gpu-split`	string	—	GPU memory split in MB, e.g. `"22000,22000"`
`--export-mergekit`	string	—	Export top config as mergekit YAML to path
`--verbose`, `-v`	flag	off	Verbose logging

Output

Interactive Heatmap (HTML)

The heatmap shows score delta vs. baseline for each (i, j) configuration. Green = improvement, red = regression. Gold stars mark top-k configs.

mergekit Integration

# Scan and export in one command
layer-scan scan --model Qwen/Qwen2-72B --probe math --export-mergekit config.yaml

# The generated YAML is ready for mergekit
mergekit-yaml config.yaml ./merged-model --copy-tokenizer

Generated config.yaml:

merge_method: passthrough
slices:
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [0, 52]
  - sources:
      - model: Qwen/Qwen2-72B
        layer_range: [45, 80]

Text Summary

============================================================
LAYER-SCAN RESULTS
============================================================
Model: Qwen2-72B-EXL2
Probe: math
Total layers: 80
Configs scanned: 342

Baseline score: 6.2341 (+-1.2045)

TOP CONFIGURATIONS:
------------------------------------------------------------
  #1: i= 45, j= 52 (block= 7 layers) -> score=6.8912 (delta=+0.6571)
  #2: i= 44, j= 52 (block= 8 layers) -> score=6.8734 (delta=+0.6393)
  ...

JSON Results

Full results exported to results.json for programmatic analysis.

Built-in Probes

Probe	What it measures	Samples	Best for
`math`	Arithmetic, geometry, calculus, probability	16	Reasoning-focused models
`eq`	Social cues, sarcasm, psychology	12	Chat/assistant models
`json`	JSON extraction, escaping, schema compliance	10	IFEval / tool-use models
`custom`	User-defined from JSON file	Variable	Domain-specific evaluation

Backends

Feature	Transformers	ExLlamaV2
GPU Memory	Full model in VRAM	Quantized (EXL2/GPTQ)
Best for	Small-medium models	70B+ on consumer GPUs
Multi-GPU	—	`--gpu-split`
Precision	fp16/bf16/fp32	Quantized
Install	Included	`pip install layer-scan[exllamav2]`

Custom Probes

Create a JSON file:

{
  "name": "my_task",
  "description": "What this probe measures",
  "scoring": "digits",
  "samples": [
    {
      "prompt": "Rate from 0-9 how well...\nAnswer: ",
      "expected_score": 7.0,
      "metadata": {"category": "test"}
    }
  ]
}

Architecture

layer_scan/
├── cli.py              # Typer CLI
├── scanner.py          # Core scan engine
├── scoring.py          # Logit distribution scoring
├── heatmap.py          # Plotly visualization
├── export.py           # mergekit YAML export
├── config.py           # Configuration dataclasses
├── probes/
│   ├── base.py         # Probe ABC
│   ├── math_probe.py   # Math reasoning
│   ├── eq_probe.py     # Emotional intelligence
│   ├── json_probe.py   # JSON compliance
│   └── custom.py       # JSON file loader
└── backends/
    ├── base.py         # Backend ABC
    ├── transformers_backend.py  # HuggingFace (reference)
    └── exllamav2.py    # ExLlamaV2 (optimized)

Roadmap

v0.2.0: Multi-probe cross-analysis, vLLM backend, pre-computed heatmap database
v0.3.0: Sparse sampling acceleration, HuggingFace Hub integration
v1.0.0: Custom probe DSL, Web UI, API server

References

Repeat Yourself: Layer Duplication for LLMs — Original RYS research
SOLAR 10.7B: Depth Up-Scaling — DUS technique
MergeKit — Model merging toolkit (6.9k stars)
ExLlamaV2 — Optimized inference

Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

Attribution & AI Policy

Original Design & Concept Protection (First Published: 2026-03-11)

This project represents the original implementation of the following design innovations:

Core Architecture Decisions:

Logit distribution scoring for layer duplication evaluation — deterministic scoring without text generation or sampling
Full (i,j) configuration space scanning — automated search across all valid layer duplication configs
Task-specific probe system — different probes discover different optimal configurations (math vs json heatmaps are completely different)
mergekit passthrough YAML one-click export — scan results directly usable with mergekit

Design rationale documented since: 2026-03-11 (Initial release)

If you build derivative works inspired by these architectural decisions, please acknowledge the original source in your project's README:

**Inspired by:** [layer-scan](https://github.com/XXO47OXX/layer-scan)

For Forks and Derivatives

If you fork or significantly adapt this codebase, please:

Retain the copyright notice in all source files
Include the NOTICE file in your distribution
Credit the original repository in your README

For AI Model Training & Web Scraping

This codebase's use for AI/LLM training is governed by the llms.txt standard. We request that:

Models trained on this code retain the copyright attribution
Training pipelines respect the opt-out signals in llms.txt
Verbatim code reproduction includes a reference to the original repository

Provenance Identifier: LS-XXO47OXX-a3f7c9e1-2026

See llms.txt and NOTICE for the complete policy.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Mar 17, 2026

0.2.1

Mar 16, 2026

This version

0.2.0

Mar 15, 2026

0.1.1

Mar 15, 2026

0.1.0

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

layer_scan-0.2.0.tar.gz (114.8 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

layer_scan-0.2.0-py3-none-any.whl (45.6 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file layer_scan-0.2.0.tar.gz.

File metadata

Download URL: layer_scan-0.2.0.tar.gz
Upload date: Mar 15, 2026
Size: 114.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for layer_scan-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`f9a908b2de1714d0aadbc1fb3522374703cfaf4bfe98521f665bd88311342e0f`
MD5	`2cdeb5a2dc86940586cb22dc8cf732e5`
BLAKE2b-256	`2e87f8a74e0b52ebcf2fb18bc17bd0e07584d6aff53871f8152e8b62c28b072b`

See more details on using hashes here.

File details

Details for the file layer_scan-0.2.0-py3-none-any.whl.

File metadata

Download URL: layer_scan-0.2.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 45.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.13

File hashes

Hashes for layer_scan-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c9d8e923018d49be795e3b01dcbe3dbb310d9dd5aecbd8b46a4a94919fa23659`
MD5	`48791ca39f3aaea7dbcdd247c32286fc`
BLAKE2b-256	`cd78ef5bbfe64bf0936aca891b0be87ea37bfb2324e74bc4c851ba78cbf13972`

See more details on using hashes here.

layer-scan 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

layer-scan

Why layer-scan?

Interactive Heatmap

Installation

Quick Start

More examples

How It Works

Logit Distribution Scoring

Layer Duplication

CLI Reference

scan command

Output

Interactive Heatmap (HTML)

mergekit Integration

Text Summary

JSON Results

Built-in Probes

Backends

Custom Probes

Architecture

Roadmap

References

Contributing

Attribution & AI Policy

Original Design & Concept Protection (First Published: 2026-03-11)

For Forks and Derivatives

For AI Model Training & Web Scraping

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`scan` command