Publication-quality LLM architecture diagrams, fact sheets, and diffs from HuggingFace config.json — no weights needed
Project description
llmviz
Publication-quality LLM architecture figures from config.json alone — no weights, no GPU, no transformers install.
Point it at any model — a Hugging Face id, a local GGUF file, or a model installed in Ollama — and get a hand-drawn-quality architecture figure in the visual language of Sebastian Raschka's LLM Architecture Gallery: the decoder tower with dotted-leader callouts, the MoE router inset, the SwiGLU module, and parameter counts computed from the config, not scraped from a model card.
Why
Hand-drawn galleries are wonderful and update episodically. llmviz generates the same figure in milliseconds, for any model, the day its config lands on the Hub — including architectures that didn't exist when this tool was written. The parser is generic-first (field-name synonyms, capability detection, graceful degradation), so hybrid Mamba mixers, linear attention, MLA, sandwich norms, and whatever ships next all render correctly or degrade honestly.
The numbers are the point: total and active parameters, KV-cache bytes per token, and VRAM footprints are reconstructed from per-layer math. The test suite pins them against published figures — Llama-3-8B at 8.03B, DeepSeek-V3 at 671B/37.5B, Qwen3-235B-A22B at 235B/22B — within 0.5–3%. When Kimi-Linear-48B-A3B (a model the code was never tuned for) parses to 48.9B total / 3.3B active, you know the math is doing the work.
Install
pip install llmviz # SVG figures
pip install "llmviz[png]" # + PNG export (cairosvg)
pip install "llmviz[explain]" # + LLM-written notes (LiteLLM: Ollama, llama.cpp, any provider)
pip install "llmviz[mcp]" # + MCP server for agents
Sixty seconds
llmviz render deepseek-ai/DeepSeek-V3 # the figure above → DeepSeek-V3.svg
llmviz render ollama:deepseek-r1 # a model installed in YOUR Ollama
llmviz inspect ./model-q4.gguf # any local or remote .gguf (header-only read)
llmviz fit Qwen/Qwen3-235B-A22B -c 131072 # can I run it? fp16/q8/q4 + your GPU verdict
llmviz diff deepseek-ai/DeepSeek-V3 NousResearch/Meta-Llama-3-8B
Commands
| Command | What it produces |
|---|---|
render <model> |
The architecture figure (SVG/PNG). --animate adds a staggered build-up. |
diff <a> <b> |
Two towers side by side + a comparison table with every difference flagged |
lineage <m1> <m2> … |
Family evolution strip with per-generation "what changed" deltas |
card <model> |
1200×630 social card — headline stats + the tower, share-ready |
poster models.yaml |
Print-ready grid of towers on one sheet (--cols, --title) |
gallery models.yaml |
Self-contained static HTML gallery with search/sort. --space user/name deploys it to a free HF Space |
watch |
Gallery of the Hub's trending models right now — pair with --space on a cron for a self-updating public gallery |
inspect <model> |
The normalized fact sheet as a terminal table |
fit <model> |
Quantization-aware memory needs (weights + KV cache) and which GPUs fit — detects your local GPU via nvidia-smi |
explain <model> |
Five LLM-written notes on what's architecturally notable — local-first via Ollama |
mcp |
MCP server (stdio): inspect_architecture, memory_to_run, render_architecture_figure, diff_architectures as agent tools |
Every command accepts a Hugging Face id (org/name), a local config.json path, a .gguf file or URL, or ollama:<name>. Gated repos (meta-llama, google) need --token or hf auth login.
What it reads
| Signal | Source fields |
|---|---|
| MHA / GQA / MQA | num_key_value_heads vs num_attention_heads, multi_query |
| MLA (DeepSeek-style latent KV) | kv_lora_rank, q_lora_rank, decoupled-RoPE head dims |
| MoE | num_experts / n_routed_experts / num_local_experts, five spellings of top-k, shared experts, leading dense layers |
| Hybrid token mixers | layer_types, linear_attn_config, full_attn_idxs, attn_type_list, mamba_* — summarized as e.g. "20 linear-attention (KDA) : 7 full attention layers" |
| Norm placement | pre (default), post (OLMo-2), sandwich (Gemma) — drawn structurally |
| The rest | sliding windows and local:global ratios, QK-norm, RoPE θ / ALiBi / learned, tied embeddings, activation |
GGUF sources are read header-only (a few MB, never the weights), including vocab size recovered from the tokenizer array length — so llmviz render ollama:qwen2.5-coder:14b diagrams a 9 GB model in under a second. For remote GGUF URLs only the metadata bytes are fetched via ranged HTTP.
Counting convention: "active" means every parameter touched in a forward pass, including embeddings and the LM head — some vendors report actives excluding the unembedding, so their number may read slightly lower.
explain providers
explain is local-first through LiteLLM:
llmviz explain zai-org/GLM-4.5-Air # local Ollama (auto-picks an installed model)
llmviz explain <m> --llm openai/local --api-base http://localhost:8080/v1 # llama.cpp server
llmviz explain <m> --llm groq/llama-3.3-70b-versatile # any hosted LiteLLM provider
export LLMVIZ_LLM="ollama/deepseek-r1:latest" # set your default
Reasoning models (DeepSeek-R1, Qwen3) are handled — thinking is stripped, the answer is kept.
MCP
Give any agent architecture facts computed from configs instead of recalled from training data:
{"mcpServers": {"llmviz": {"command": "llmviz", "args": ["mcp"]}}}
Python API
from llmviz.fetch import load_spec
from llmviz.render.block import render_model
spec = load_spec("Qwen/Qwen3-235B-A22B") # or "ollama:deepseek-r1", "./model.gguf"
spec.total_params, spec.active_params, spec.attention.kind, spec.hybrid_note
svg = render_model(spec)
ArchSpec is a Pydantic model — spec.model_dump_json() gives you the normalized architecture for your own tooling.
Development
git clone https://github.com/h9-tec/llmviz && cd llmviz
python -m venv .venv && .venv/bin/pip install -e ".[dev,png]"
.venv/bin/pytest # offline; fixtures are real Hub configs
.venv/bin/ruff check src tests
Tests treat published parameter counts as ground truth — if the per-layer math doesn't reproduce a model's documented size, the parser is wrong. CI runs on 3.11/3.12; a nightly workflow rebuilds the trending gallery and deploys it to a Hugging Face Space; tagging v* publishes to PyPI via trusted publishing.
Acknowledgements
The visual language is a faithful implementation of Sebastian Raschka's LLM Architecture Gallery figures (sebastianraschka.com/llm-architecture-gallery) — colors were sampled from his published figures with admiration. If you want the hand-crafted originals with his commentary, go read Ahead of AI.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llmviz-0.1.0.tar.gz.
File metadata
- Download URL: llmviz-0.1.0.tar.gz
- Upload date:
- Size: 51.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
37352f7e103b0fea5b67098bd410fc43dcf7f0d3a828e7f1a600205d70bb7202
|
|
| MD5 |
73121c35a1130c3ce6a495c0c92feaae
|
|
| BLAKE2b-256 |
e584441fb41a981a31cef2a33fdc7900ebde98b0eb72798b8841eb90abbbdaef
|
Provenance
The following attestation bundles were made for llmviz-0.1.0.tar.gz:
Publisher:
release.yml on h9-tec/llmviz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmviz-0.1.0.tar.gz -
Subject digest:
37352f7e103b0fea5b67098bd410fc43dcf7f0d3a828e7f1a600205d70bb7202 - Sigstore transparency entry: 2071062484
- Sigstore integration time:
-
Permalink:
h9-tec/llmviz@63748d318c4dedbc01e33a94c60eb4f4feb3bc47 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/h9-tec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@63748d318c4dedbc01e33a94c60eb4f4feb3bc47 -
Trigger Event:
push
-
Statement type:
File details
Details for the file llmviz-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llmviz-0.1.0-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a0df5b6cfd90263ce7dadc49fb7c2e6074fd05568beabd8a5fe10d31d0c9a01
|
|
| MD5 |
eff77e153ff84f184740e94877512670
|
|
| BLAKE2b-256 |
529d3f1b86eda0f42b2dbdc286c16f72bb2ebc4b29e414ff13a12dcd65b72da4
|
Provenance
The following attestation bundles were made for llmviz-0.1.0-py3-none-any.whl:
Publisher:
release.yml on h9-tec/llmviz
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
llmviz-0.1.0-py3-none-any.whl -
Subject digest:
1a0df5b6cfd90263ce7dadc49fb7c2e6074fd05568beabd8a5fe10d31d0c9a01 - Sigstore transparency entry: 2071062569
- Sigstore integration time:
-
Permalink:
h9-tec/llmviz@63748d318c4dedbc01e33a94c60eb4f4feb3bc47 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/h9-tec
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@63748d318c4dedbc01e33a94c60eb4f4feb3bc47 -
Trigger Event:
push
-
Statement type: