Skip to main content

Compare LLM architectures without downloading weights — structural fingerprint & proxy-test advisor

Project description

modelsig

Compare LLM architectures without downloading weights.

modelsig extracts a multi-layer structural fingerprint from any HuggingFace model and tells you whether two models are architecturally equivalent — so the smaller one can act as a valid proxy for testing the larger one.

Weekly Validation Python 3.9+ License: Apache 2.0


What problem does it solve?

Testing inference engines (vLLM, TensorRT-LLM, SGLang, llama.cpp, ONNX Runtime, etc.) against every large model is prohibitively expensive. modelsig answers:

"Can I test Qwen3-72B correctness using Qwen3-7B instead?" "Is Nemotron-120B-FP4 architecturally equivalent to the BF16 variant?" "Does this ONNX export match the original safetensors model?"

It compares structural fingerprints — shape ratios, operator sets, KV cache patterns, layer topology — without ever downloading a single weight tensor.


Key Features

  • Zero weight download — safetensors header via HTTP Range (~20 bytes), ONNX graph-only (no .onnx_data), or config-only fast mode
  • 5-layer fingerprint — static weights, arch config, op types, KV cache pattern, optional hook shapes
  • 3-phase isomorphism comparison — key overlap, substructure, algebraic scaling
  • Substitution verdictsFULL_SUBSTITUTE / PARTIAL_SUBSTITUTE / NO_SUBSTITUTE
  • 4-level multi-fidelity test plan — maps models to test coverage levels L1–L4
  • Wide model support — dense decoder, GQA, MoE, vision-language, speech, ONNX classification
  • Both HF and local models — supports local:/path/to/model
  • JSON / table / markdown output — CI-friendly JSON, human-readable table, shareable markdown

Installation

From PyPI (recommended)

uv add modelsig           # add to a uv project
# or
uv tool install modelsig  # install as a standalone CLI tool

From source

git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync                   # install deps + editable package
uv run modelsig --help    # run inside the uv environment

Full (all parsers enabled)

uv sync --extra full
# Adds: onnx, transformers, torch, safetensors

Still using pip?

pip install modelsig
pip install "modelsig[full]"   # all optional parsers

Dependency breakdown:

Package Required Purpose
requests HTTP Range fetching for safetensors headers
huggingface_hub Model file listing, downloads, auth
onnx optional ONNX graph parsing (falls back to built-in protobuf)
transformers optional AutoConfig normalization, FX trace, hook capture
torch optional FX symbolic trace and forward-hook shape capture
safetensors optional Local safetensors file parsing

Quick Start

# Analyze a single model
modelsig Qwen/Qwen3-7B --output table

# Compare two models (proxy-test decision)
modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table

# Fast mode for large models (config only, no download)
modelsig nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 --fast --output table

# ONNX model
modelsig onnx-community/Qwen3.5-0.8B-ONNX --output json

# Private/gated model
modelsig org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx

How It Works

Zero-Weight-Download

For safetensors models, only the file header is fetched via HTTP Range requests (~20 bytes per shard). No weights are transferred.

For ONNX models, only the .onnx graph file is downloaded (typically 1–5 MB). The paired .onnx_data weight file (which can be GBs) is never touched.

For fast mode (--fast), only config.json is fetched (a few KB). No tensors at all.

5-Layer Signature System

Layer What it captures Source
L1 Static weight signature Per-tensor {abstract_key → shape, dtype, layer_type} — layer indices normalized to .N. safetensors header / ONNX initializers
L2 Architecture fingerprint hidden_size, num_hidden_layers, num_attention_heads, num_key_value_heads, intermediate_size, head_dim, MoE config config.json via AutoConfig
L3 Op type set Canonical operator vocabulary: aten/mm, attention, rms_norm, rope, silu, topk/router tensor key patterns / ONNX opset
L4 KV cache shape pattern [batch, num_kv_heads, seq_len, head_dim] derived from L2
L5 Hook shapes (optional) Per-module I/O shapes from a forward pass on meta device torch forward hooks

3-Phase Isomorphism Comparison

Phase 1 — Key coverage    : normalized key set overlap ≥ 80%
Phase 2 — Substructure    : attention / FFN / norm submodules match
Phase 3 — Algebraic scale : hidden_size / intermediate_size / head_dim ratios uniform within 20%

Result: ISOMORPHIC / SCALE_ONLY / DIFFERENT_ARCH

Substitution Verdict

Verdict Meaning
FULL_SUBSTITUTE All 3 phases pass + shape ratios uniform + layer_type_coverage ≥ 95%
PARTIAL_SUBSTITUTE Phase 1+2 pass or op coverage ≥ 80%
NO_SUBSTITUTE Different arch, MoE vs Dense mismatch, or key divergence

Multi-Fidelity Test Plan (4 levels)

L1 Structure    — cheapest: model loading, tensor shapes, dtype validation
L2 Numerical    — cosine similarity, perplexity on calibration set
L3 Runtime      — prefill latency, decode throughput, KV cache eviction
L4 Canary       — large/MoE model: peak memory, TP/PP correctness

Usage

Basic — analyze a single model

modelsig Qwen/Qwen3-7B --output table
==============================================================================
  modelsig v2.0  |  2026-03-17T10:00:00Z
==============================================================================

   Model: Qwen/Qwen3-7B
  type                   qwen3
  hidden_size            3584
  num_hidden_layers      28
  num_attention_heads    28  (kv: 8)
  intermediate_size      18944
  head_dim               128
  is_moe                 False
  ffn_expansion          5.285714
  gqa_ratio              3.5
  kv_cache_pattern       [batch, 8, seq_len, 128]
  op_types               aten/mm, attention, embedding, rms_norm, rope, silu, swiglu
  layer_types            AttentionLayer, EmbeddingLayer, FFN_SwiGLU, LMHead, RMSNorm
  abstract_keys          14
  source                 safetensors

Compare models (proxy-testing decision)

modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table

Full analysis with multi-fidelity plan

modelsig \
    Qwen/Qwen3-7B Qwen/Qwen3-30B-A3B Qwen/Qwen3-235B-A22B \
    --compare --multi-fidelity --output markdown --save report.md

ONNX model

modelsig onnx-community/Qwen3-4B-ONNX --output json

Config-only fast mode (no safetensors/ONNX fetch, instantaneous)

modelsig Qwen/Qwen3-235B-A22B --fast --output table

Local model directory

modelsig local:/path/to/model --output json
modelsig local:/path/to/7b local:/path/to/72b --compare

Private / gated models

modelsig org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx

Save report

modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B \
    --compare --output markdown --save report.md

Models with custom code

# Only use --trust-remote-code for models you trust.
# This allows execution of arbitrary Python code from the model repository.
modelsig org/custom-model --trust-remote-code --no-fx-trace

Scenario Examples

Scenario 1 — Inference Engine Regression Testing

Problem: You want to validate a new vLLM kernel for Qwen3-72B but CI is limited to A10G GPUs (24 GB VRAM).

modelsig Qwen/Qwen3-7B Qwen/Qwen3-72B --compare --output table

Expected result: ISOMORPHIC / FULL_SUBSTITUTE — same GQA pattern, same op set, uniform scaling. You can run full functional tests on 7B and gate the 72B behind a nightly canary run.


Scenario 2 — MoE vs Dense Compatibility Check

Problem: Does Qwen3-30B-A3B (MoE) behave like a drop-in proxy for Qwen3-235B-A22B?

modelsig Qwen/Qwen3-30B-A3B Qwen/Qwen3-235B-A22B \
    --compare --multi-fidelity --output markdown

Both are MoE models from the same family → ISOMORPHIC. The multi-fidelity plan shows:

  • L1: use 30B-A3B for structure/conversion tests
  • L2: numerical validation on 30B
  • L4: 235B-A22B as canary for routing correctness and peak memory

Scenario 3 — Cross-Family Sanity Check

Problem: Can Llama-3.1-8B proxy-test a Mistral-7B?

modelsig meta-llama/Llama-3.1-8B-Instruct mistralai/Mistral-7B-v0.1 \
    --compare --output json

Both are dense GQA decoders with the same op set → ISOMORPHIC / FULL_SUBSTITUTE. Despite different model_type labels, the structural fingerprint matches.


Scenario 4 — ONNX Runtime Compatibility

Problem: You converted GPT-2 to ONNX and want to verify the ONNX version matches the torch version structurally.

modelsig openai-community/gpt2 onnx-community/gpt2 --compare --output table

The ONNX version is parsed from the .onnx graph file. The safetensors version is parsed from the header. Both share the same abstract key set → ISOMORPHIC.


Scenario 5 — Quantized Model Compatibility

Problem: Will nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 (quantized to FP4) behave the same as the BF16 variant?

modelsig \
    nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
    nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
    --compare --fast --output table

Both share the same architecture (120B MoE). --fast uses config-only mode to avoid downloading large safetensors headers. Result: ISOMORPHIC — same layer topology, only dtype differs.


Scenario 6 — Batch Analysis with QuantPathSignature

Problem: Prepare a quantization validation plan for a fleet of Qwen models.

modelsig \
    Qwen/Qwen3-0.6B Qwen/Qwen3-1.7B Qwen/Qwen3-4B Qwen/Qwen3-7B \
    --compare --quant-path --output json --save qwen3_fleet.json

The quant_path_signature block for each model documents arch_template (gqa_decoder), kv_cache_dtype, group_size, scale_scheme — feeding directly into a quantization config generator.


CLI Reference

modelsig MODEL_ID [MODEL_ID ...] [OPTIONS]

Arguments:
  MODEL_ID              HF model ID (e.g. Qwen/Qwen3-7B) or local:PATH

Options:
  --output              json | table | markdown  (default: json)
  --compare             Compute pairwise coverage for all model pairs
  --save FILE           Save output to file
  --fast                Config-only mode — no safetensors/ONNX download
  --quant-path          Include QuantPathSignature block
  --multi-fidelity      Include 4-level multi-fidelity test plan
  --no-fx-trace         Disable FX symbolic trace (on by default)
  --no-hook-capture     Disable forward-hook capture (on by default)
  --token TOKEN         HF Hub token for private/gated models
  --timeout SEC         HTTP timeout (default: 30)
  --no-color            Disable ANSI colors in table output
  --trust-remote-code   Allow trust_remote_code=True for custom model code
                        ⚠ enables arbitrary code execution — use only for trusted models

Module Structure

modelsig/
├── analyze.py              CLI entry point (~190 lines)
├── constants.py            Shared constants: TOOL_NAME, _OP_RULES, _LAYER_TYPE_RULES, …
│
├── hf/
│   └── client.py           HF Hub client: token management, HTTP GET + backoff,
│                           model_info().siblings, hf_hub_download
│
├── parsers/
│   ├── safetensors.py      HTTP Range header fetch + local shard discovery
│   └── config.py           AutoConfig.from_pretrained() + _flatten_config() aliases
│
├── onnx/
│   ├── ops.py              _ONNX_DTYPE map, _ONNX_OP_MAP, canonical op mapping
│   ├── parser.py           onnx.load(load_external_data=False) + protobuf fallback
│   ├── selector.py         Primary .onnx file selection heuristics
│   └── collector.py        Orchestrates HF download → parse pipeline
│
├── torch/
│   ├── fx_trace.py         FX symbolic trace on meta device (lazy torch import)
│   └── hooks.py            Forward-hook I/O shape capture (lazy torch import)
│
├── signature/
│   ├── static.py           L1: build_static_weight_signature, norm_key, norm_dtype
│   ├── arch.py             L2: build_arch_fingerprint, KV cache pattern, dim ratios
│   ├── quant.py            QuantPathSignature builder
│   ├── template.py         Per-layer canonical submodule template (for phase-2)
│   └── fingerprint.py      ModelFingerprint dataclass + build_fingerprint orchestrator
│
├── comparison/
│   ├── phases.py           Phase 1/2/3 isomorphism tests
│   ├── ratios.py           Shape ratio uniformity analysis
│   ├── coverage.py         Unified compute_coverage + test strategy verdict
│   └── multifidelity.py    4-level multi-fidelity test plan builder
│
└── output/
    ├── colors.py           ANSI color helpers
    ├── json_fmt.py         JSON formatter + fp_to_dict
    ├── table_fmt.py        ANSI table formatter
    └── markdown_fmt.py     Markdown report formatter

Security

  • No arbitrary code execution by default. trust_remote_code is False unless explicitly set via --trust-remote-code.
  • Token safety. The HF token is passed via HTTP headers only — never embedded in URLs or logged to stderr.
  • No weight download. Only metadata (safetensors header, ONNX graph, config.json) is fetched.

Design Principles

Principle Implementation
Zero weight download HTTP Range (safetensors), graph-only .onnx, config-only fast path
Framework-driven parsing AutoConfig.from_pretrained() for config normalization; onnx.load() for graph parsing
Graceful degradation Every heavy dependency is optional — falls back to built-in parsers
Architecture-agnostic Works on dense decoders, GQA models, MoE, vision-language, speech, classification
Single CLI, composable API Import any module independently or use the unified CLI
Safe by default trust_remote_code=False; token in headers not URLs

Supported Model Families

Validated weekly against 57 models (29 safetensors + 28 ONNX):

Safetensors (full header fetch): Qwen3.5-{0.8B,4B,9B,27B,35B-A3B,397B-A17B}, Qwen2.5-7B-Instruct, Qwen3-Coder-Next, DeepSeek-V3.2, Kimi-K2.5, MiniMax-M2.5, GLM-5, Nemotron-3-{Nano-4B, Super-120B}-{BF16,NVFP4,FP8}, Granite-4.0-1b-speech, BitNet-b1.58-2B-4T, MiroThinker-{1.7,1.7-mini}, Sarvam-{30b,105b}, Reka-edge-2603, LocoTrainer-4B, OmniCoder-9B, Nanbeige4.1-3B, Param2-17B-A2.4B, gpt-oss-20b, all-MiniLM-L6-v2

ONNX (graph-only, no weight download): Qwen3.5-{0.8B,2B,4B}-ONNX, Qwen3-{4B-VL,VL-2B,Reranker-0.6B}-ONNX, Qwen2.5-{0.5B,VL-3B}-ONNX, LFM2-24B-A2B, Olmo-Hybrid-{SFT,DPO,Think}-7B, Voxtral-Mini-4B, Granite-4.0-1b-speech, Nemotron-Nano-4B, BERT-multilingual-NER, chinese-RoBERTa, multilingual-MiniLMv2, CodeT5, Jan-code-4b, Josiefied-Qwen3.5-0.8B, IndoBERT-news-classification, ai-image-detection × 4, vehicle-classification, tmr-text-detector


Contributing

All logic is in the modelsig/ package. Each subdirectory has a single responsibility. Tests live in tests/ and cover 130+ unit + integration scenarios.

git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync                   # installs modelsig + dev deps
uv run pytest tests/ -v

Weekly validation against the full model zoo runs via GitHub Actions (.github/workflows/weekly-validation.yml).


Related Projects


License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

modelsig-2.0.0.tar.gz (169.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

modelsig-2.0.0-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file modelsig-2.0.0.tar.gz.

File metadata

  • Download URL: modelsig-2.0.0.tar.gz
  • Upload date:
  • Size: 169.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelsig-2.0.0.tar.gz
Algorithm Hash digest
SHA256 fb141c1b677d878afe7329483dce4b8a4bdd07d263c2886a9d605b9ad8d0f4f6
MD5 b91f2ad23ec8a3e0aac8b3db53f9a5e1
BLAKE2b-256 ec6ea30d5ed8c89a3a35d142ea3203b7537813e3781e755d720f6057b258e659

See more details on using hashes here.

File details

Details for the file modelsig-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: modelsig-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for modelsig-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 43b5f5af61d81fa39a670e23f6df9b2f9074f6e30cd4d97e4185812c8c5d6b21
MD5 953737c0816cbcbf9cca2dd95509f8e0
BLAKE2b-256 743e6cc481be997648155901fa7191a98a60f7e9a65d1409a8350af09015ba5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page