Compare LLM architectures without downloading weights — structural fingerprint & proxy-test advisor
Project description
modelsig
Compare LLM architectures without downloading weights.
modelsig extracts a multi-layer structural fingerprint from any HuggingFace model and tells you whether two models are architecturally equivalent — so the smaller one can act as a valid proxy for testing the larger one.
What problem does it solve?
Testing inference engines (vLLM, TensorRT-LLM, SGLang, llama.cpp, ONNX Runtime, etc.) against every large model is prohibitively expensive. modelsig answers:
"Can I test Qwen3-72B correctness using Qwen3-7B instead?" "Is Nemotron-120B-FP4 architecturally equivalent to the BF16 variant?" "Does this ONNX export match the original safetensors model?"
It compares structural fingerprints — shape ratios, operator sets, KV cache patterns, layer topology — without ever downloading a single weight tensor.
Key Features
- Zero weight download — safetensors header via HTTP Range (~20 bytes), ONNX graph-only (no
.onnx_data), or config-only fast mode - 5-layer fingerprint — static weights, arch config, op types, KV cache pattern, optional hook shapes
- 3-phase isomorphism comparison — key overlap, substructure, algebraic scaling
- Substitution verdicts —
FULL_SUBSTITUTE / PARTIAL_SUBSTITUTE / NO_SUBSTITUTE - 4-level multi-fidelity test plan — maps models to test coverage levels L1–L4
- Wide model support — dense decoder, GQA, MoE, vision-language, speech, ONNX classification
- Both HF and local models — supports
local:/path/to/model - JSON / table / markdown output — CI-friendly JSON, human-readable table, shareable markdown
Installation
From PyPI (recommended)
uv add modelsig # add to a uv project
# or
uv tool install modelsig # install as a standalone CLI tool
From source
git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync # install deps + editable package
uv run modelsig --help # run inside the uv environment
Full (all parsers enabled)
uv sync --extra full
# Adds: onnx, transformers, torch, safetensors
Still using pip?
pip install modelsig
pip install "modelsig[full]" # all optional parsers
Dependency breakdown:
| Package | Required | Purpose |
|---|---|---|
requests |
✅ | HTTP Range fetching for safetensors headers |
huggingface_hub |
✅ | Model file listing, downloads, auth |
onnx |
optional | ONNX graph parsing (falls back to built-in protobuf) |
transformers |
optional | AutoConfig normalization, FX trace, hook capture |
torch |
optional | FX symbolic trace and forward-hook shape capture |
safetensors |
optional | Local safetensors file parsing |
Quick Start
# Analyze a single model
modelsig Qwen/Qwen3-7B --output table
# Compare two models (proxy-test decision)
modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table
# Fast mode for large models (config only, no download)
modelsig nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 --fast --output table
# ONNX model
modelsig onnx-community/Qwen3.5-0.8B-ONNX --output json
# Private/gated model
modelsig org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx
How It Works
Zero-Weight-Download
For safetensors models, only the file header is fetched via HTTP Range requests (~20 bytes per shard). No weights are transferred.
For ONNX models, only the .onnx graph file is downloaded (typically 1–5 MB). The paired .onnx_data weight file (which can be GBs) is never touched.
For fast mode (--fast), only config.json is fetched (a few KB). No tensors at all.
5-Layer Signature System
| Layer | What it captures | Source |
|---|---|---|
| L1 Static weight signature | Per-tensor {abstract_key → shape, dtype, layer_type} — layer indices normalized to .N. |
safetensors header / ONNX initializers |
| L2 Architecture fingerprint | hidden_size, num_hidden_layers, num_attention_heads, num_key_value_heads, intermediate_size, head_dim, MoE config |
config.json via AutoConfig |
| L3 Op type set | Canonical operator vocabulary: aten/mm, attention, rms_norm, rope, silu, topk/router … |
tensor key patterns / ONNX opset |
| L4 KV cache shape pattern | [batch, num_kv_heads, seq_len, head_dim] |
derived from L2 |
| L5 Hook shapes (optional) | Per-module I/O shapes from a forward pass on meta device | torch forward hooks |
3-Phase Isomorphism Comparison
Phase 1 — Key coverage : normalized key set overlap ≥ 80%
Phase 2 — Substructure : attention / FFN / norm submodules match
Phase 3 — Algebraic scale : hidden_size / intermediate_size / head_dim ratios uniform within 20%
Result: ISOMORPHIC / SCALE_ONLY / DIFFERENT_ARCH
Substitution Verdict
| Verdict | Meaning |
|---|---|
FULL_SUBSTITUTE |
All 3 phases pass + shape ratios uniform + layer_type_coverage ≥ 95% |
PARTIAL_SUBSTITUTE |
Phase 1+2 pass or op coverage ≥ 80% |
NO_SUBSTITUTE |
Different arch, MoE vs Dense mismatch, or key divergence |
Multi-Fidelity Test Plan (4 levels)
L1 Structure — cheapest: model loading, tensor shapes, dtype validation
L2 Numerical — cosine similarity, perplexity on calibration set
L3 Runtime — prefill latency, decode throughput, KV cache eviction
L4 Canary — large/MoE model: peak memory, TP/PP correctness
Usage
Basic — analyze a single model
modelsig Qwen/Qwen3-7B --output table
==============================================================================
modelsig v2.0 | 2026-03-17T10:00:00Z
==============================================================================
Model: Qwen/Qwen3-7B
type qwen3
hidden_size 3584
num_hidden_layers 28
num_attention_heads 28 (kv: 8)
intermediate_size 18944
head_dim 128
is_moe False
ffn_expansion 5.285714
gqa_ratio 3.5
kv_cache_pattern [batch, 8, seq_len, 128]
op_types aten/mm, attention, embedding, rms_norm, rope, silu, swiglu
layer_types AttentionLayer, EmbeddingLayer, FFN_SwiGLU, LMHead, RMSNorm
abstract_keys 14
source safetensors
Compare models (proxy-testing decision)
modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table
Full analysis with multi-fidelity plan
modelsig \
Qwen/Qwen3-7B Qwen/Qwen3-30B-A3B Qwen/Qwen3-235B-A22B \
--compare --multi-fidelity --output markdown --save report.md
ONNX model
modelsig onnx-community/Qwen3-4B-ONNX --output json
Config-only fast mode (no safetensors/ONNX fetch, instantaneous)
modelsig Qwen/Qwen3-235B-A22B --fast --output table
Local model directory
modelsig local:/path/to/model --output json
modelsig local:/path/to/7b local:/path/to/72b --compare
Private / gated models
modelsig org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx
Save report
modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B \
--compare --output markdown --save report.md
Models with custom code
# Only use --trust-remote-code for models you trust.
# This allows execution of arbitrary Python code from the model repository.
modelsig org/custom-model --trust-remote-code --no-fx-trace
Scenario Examples
Scenario 1 — Inference Engine Regression Testing
Problem: You want to validate a new vLLM kernel for Qwen3-72B but CI is limited to A10G GPUs (24 GB VRAM).
modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table
Expected result: ISOMORPHIC / FULL_SUBSTITUTE — same GQA pattern, same op set, uniform scaling. You can run full functional tests on 7B and gate the 72B behind a nightly canary run.
Scenario 2 — MoE vs Dense Compatibility Check
Problem: Does Qwen3-30B-A3B (MoE) behave like a drop-in proxy for Qwen3-235B-A22B?
modelsig Qwen/Qwen3-30B-A3B Qwen/Qwen3-235B-A22B \
--compare --multi-fidelity --output markdown
Both are MoE models from the same family → ISOMORPHIC. The multi-fidelity plan shows:
- L1: use 30B-A3B for structure/conversion tests
- L2: numerical validation on 30B
- L4: 235B-A22B as canary for routing correctness and peak memory
Scenario 3 — Cross-Family Sanity Check
Problem: Can Llama-3.1-8B proxy-test a Mistral-7B?
modelsig meta-llama/Llama-3.1-8B-Instruct mistralai/Mistral-7B-v0.1 \
--compare --output json
Both are dense GQA decoders with the same op set → ISOMORPHIC / FULL_SUBSTITUTE. Despite different model_type labels, the structural fingerprint matches.
Scenario 4 — ONNX Runtime Compatibility
Problem: You converted GPT-2 to ONNX and want to verify the ONNX version matches the torch version structurally.
modelsig openai-community/gpt2 onnx-community/gpt2 --compare --output table
The ONNX version is parsed from the .onnx graph file. The safetensors version is parsed from the header. Both share the same abstract key set → ISOMORPHIC.
Scenario 5 — Quantized Model Compatibility
Problem: Will nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 (quantized to FP4) behave the same as the BF16 variant?
modelsig \
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
--compare --fast --output table
Both share the same architecture (120B MoE). --fast uses config-only mode to avoid downloading large safetensors headers. Result: ISOMORPHIC — same layer topology, only dtype differs.
Scenario 6 — Batch Analysis with QuantPathSignature
Problem: Prepare a quantization validation plan for a fleet of Qwen models.
modelsig \
Qwen/Qwen3-0.6B Qwen/Qwen3-1.7B Qwen/Qwen3-4B Qwen/Qwen3-7B \
--compare --quant-path --output json --save qwen3_fleet.json
The quant_path_signature block for each model documents arch_template (gqa_decoder), kv_cache_dtype, group_size, scale_scheme — feeding directly into a quantization config generator.
CLI Reference
modelsig MODEL_ID [MODEL_ID ...] [OPTIONS]
Arguments:
MODEL_ID HF model ID (e.g. Qwen/Qwen3-7B) or local:PATH
Options:
--output json | table | markdown (default: json)
--compare Compute pairwise coverage for all model pairs
--save FILE Save output to file
--fast Config-only mode — no safetensors/ONNX download
--quant-path Include QuantPathSignature block
--multi-fidelity Include 4-level multi-fidelity test plan
--no-fx-trace Disable FX symbolic trace (on by default)
--no-hook-capture Disable forward-hook capture (on by default)
--token TOKEN HF Hub token for private/gated models
--timeout SEC HTTP timeout (default: 30)
--no-color Disable ANSI colors in table output
--trust-remote-code Allow trust_remote_code=True for custom model code
⚠ enables arbitrary code execution — use only for trusted models
Module Structure
modelsig/
├── analyze.py CLI entry point (~190 lines)
├── constants.py Shared constants: TOOL_NAME, _OP_RULES, _LAYER_TYPE_RULES, …
│
├── hf/
│ └── client.py HF Hub client: token management, HTTP GET + backoff,
│ model_info().siblings, hf_hub_download
│
├── parsers/
│ ├── safetensors.py HTTP Range header fetch + local shard discovery
│ └── config.py AutoConfig.from_pretrained() + _flatten_config() aliases
│
├── onnx/
│ ├── ops.py _ONNX_DTYPE map, _ONNX_OP_MAP, canonical op mapping
│ ├── parser.py onnx.load(load_external_data=False) + protobuf fallback
│ ├── selector.py Primary .onnx file selection heuristics
│ └── collector.py Orchestrates HF download → parse pipeline
│
├── torch/
│ ├── fx_trace.py FX symbolic trace on meta device (lazy torch import)
│ └── hooks.py Forward-hook I/O shape capture (lazy torch import)
│
├── signature/
│ ├── static.py L1: build_static_weight_signature, norm_key, norm_dtype
│ ├── arch.py L2: build_arch_fingerprint, KV cache pattern, dim ratios
│ ├── quant.py QuantPathSignature builder
│ ├── template.py Per-layer canonical submodule template (for phase-2)
│ └── fingerprint.py ModelFingerprint dataclass + build_fingerprint orchestrator
│
├── comparison/
│ ├── phases.py Phase 1/2/3 isomorphism tests
│ ├── ratios.py Shape ratio uniformity analysis
│ ├── coverage.py Unified compute_coverage + test strategy verdict
│ └── multifidelity.py 4-level multi-fidelity test plan builder
│
└── output/
├── colors.py ANSI color helpers
├── json_fmt.py JSON formatter + fp_to_dict
├── table_fmt.py ANSI table formatter
└── markdown_fmt.py Markdown report formatter
Security
- No arbitrary code execution by default.
trust_remote_codeisFalseunless explicitly set via--trust-remote-code. - Token safety. The HF token is passed via HTTP headers only — never embedded in URLs or logged to stderr.
- No weight download. Only metadata (safetensors header, ONNX graph, config.json) is fetched.
Design Principles
| Principle | Implementation |
|---|---|
| Zero weight download | HTTP Range (safetensors), graph-only .onnx, config-only fast path |
| Framework-driven parsing | AutoConfig.from_pretrained() for config normalization; onnx.load() for graph parsing |
| Graceful degradation | Every heavy dependency is optional — falls back to built-in parsers |
| Architecture-agnostic | Works on dense decoders, GQA models, MoE, vision-language, speech, classification |
| Single CLI, composable API | Import any module independently or use the unified CLI |
| Safe by default | trust_remote_code=False; token in headers not URLs |
Supported Model Families
Validated weekly against 57 models (29 safetensors + 28 ONNX):
Safetensors (full header fetch): Qwen3.5-{0.8B,4B,9B,27B,35B-A3B,397B-A17B}, Qwen2.5-7B-Instruct, Qwen3-Coder-Next, DeepSeek-V3.2, Kimi-K2.5, MiniMax-M2.5, GLM-5, Nemotron-3-{Nano-4B, Super-120B}-{BF16,NVFP4,FP8}, Granite-4.0-1b-speech, BitNet-b1.58-2B-4T, MiroThinker-{1.7,1.7-mini}, Sarvam-{30b,105b}, Reka-edge-2603, LocoTrainer-4B, OmniCoder-9B, Nanbeige4.1-3B, Param2-17B-A2.4B, gpt-oss-20b, all-MiniLM-L6-v2
ONNX (graph-only, no weight download): Qwen3.5-{0.8B,2B,4B}-ONNX, Qwen3-{4B-VL,VL-2B,Reranker-0.6B}-ONNX, Qwen2.5-{0.5B,VL-3B}-ONNX, LFM2-24B-A2B, Olmo-Hybrid-{SFT,DPO,Think}-7B, Voxtral-Mini-4B, Granite-4.0-1b-speech, Nemotron-Nano-4B, BERT-multilingual-NER, chinese-RoBERTa, multilingual-MiniLMv2, CodeT5, Jan-code-4b, Josiefied-Qwen3.5-0.8B, IndoBERT-news-classification, ai-image-detection × 4, vehicle-classification, tmr-text-detector
Contributing
All logic is in the modelsig/ package. Each subdirectory has a single responsibility. Tests live in tests/ and cover 130+ unit + integration scenarios.
git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync # installs modelsig + dev deps
uv run pytest tests/ -v
Weekly validation against the full model zoo runs via GitHub Actions (.github/workflows/weekly-validation.yml).
Related Projects
- huggingface_hub — HF Hub Python client
- safetensors — safe, zero-copy tensor serialization
- vLLM — high-throughput LLM inference
- ONNX Runtime — cross-platform inference accelerator
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file modelsig-2.0.0.tar.gz.
File metadata
- Download URL: modelsig-2.0.0.tar.gz
- Upload date:
- Size: 169.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb141c1b677d878afe7329483dce4b8a4bdd07d263c2886a9d605b9ad8d0f4f6
|
|
| MD5 |
b91f2ad23ec8a3e0aac8b3db53f9a5e1
|
|
| BLAKE2b-256 |
ec6ea30d5ed8c89a3a35d142ea3203b7537813e3781e755d720f6057b258e659
|
File details
Details for the file modelsig-2.0.0-py3-none-any.whl.
File metadata
- Download URL: modelsig-2.0.0-py3-none-any.whl
- Upload date:
- Size: 39.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43b5f5af61d81fa39a670e23f6df9b2f9074f6e30cd4d97e4185812c8c5d6b21
|
|
| MD5 |
953737c0816cbcbf9cca2dd95509f8e0
|
|
| BLAKE2b-256 |
743e6cc481be997648155901fa7191a98a60f7e9a65d1409a8350af09015ba5c
|