Skip to main content

Publication-quality LLM architecture diagrams, fact sheets, and diffs from HuggingFace config.json — no weights needed

Project description

llmviz

Publication-quality LLM architecture figures from config.json alone — no weights, no GPU, no transformers install.

CI PyPI Python 3.11+ License: Apache-2.0

Point it at any model — a Hugging Face id, a local GGUF file, or a model installed in Ollama — and get a hand-drawn-quality architecture figure in the visual language of Sebastian Raschka's LLM Architecture Gallery: the decoder tower with dotted-leader callouts, the MoE router inset, the SwiGLU module, and parameter counts computed from the config, not scraped from a model card.

DeepSeek-V3 architecture

Why

Hand-drawn galleries are wonderful and update episodically. llmviz generates the same figure in milliseconds, for any model, the day its config lands on the Hub — including architectures that didn't exist when this tool was written. The parser is generic-first (field-name synonyms, capability detection, graceful degradation), so hybrid Mamba mixers, linear attention, MLA, sandwich norms, and whatever ships next all render correctly or degrade honestly.

The numbers are the point: total and active parameters, KV-cache bytes per token, and VRAM footprints are reconstructed from per-layer math. The test suite pins them against published figures — Llama-3-8B at 8.03B, DeepSeek-V3 at 671B/37.5B, Qwen3-235B-A22B at 235B/22B — within 0.5–3%. When Kimi-Linear-48B-A3B (a model the code was never tuned for) parses to 48.9B total / 3.3B active, you know the math is doing the work.

Install

pip install llmviz                # SVG figures
pip install "llmviz[png]"         # + PNG export (cairosvg)
pip install "llmviz[explain]"     # + LLM-written notes (LiteLLM: Ollama, llama.cpp, any provider)
pip install "llmviz[mcp]"         # + MCP server for agents

Sixty seconds

llmviz render deepseek-ai/DeepSeek-V3            # the figure above → DeepSeek-V3.svg
llmviz render ollama:deepseek-r1                 # a model installed in YOUR Ollama
llmviz inspect ./model-q4.gguf                   # any local or remote .gguf (header-only read)
llmviz fit Qwen/Qwen3-235B-A22B -c 131072        # can I run it? fp16/q8/q4 + your GPU verdict
llmviz diff deepseek-ai/DeepSeek-V3 NousResearch/Meta-Llama-3-8B

Commands

Command What it produces
render <model> The architecture figure (SVG/PNG). --animate adds a staggered build-up.
diff <a> <b> Two towers side by side + a comparison table with every difference flagged
lineage <m1> <m2> … Family evolution strip with per-generation "what changed" deltas
card <model> 1200×630 social card — headline stats + the tower, share-ready
poster models.yaml Print-ready grid of towers on one sheet (--cols, --title)
gallery models.yaml Self-contained static HTML gallery with search/sort. --space user/name deploys it to a free HF Space
watch Gallery of the Hub's trending models right now — pair with --space on a cron for a self-updating public gallery
inspect <model> The normalized fact sheet as a terminal table
fit <model> Quantization-aware memory needs (weights + KV cache) and which GPUs fit — detects your local GPU via nvidia-smi
explain <model> Five LLM-written notes on what's architecturally notable — local-first via Ollama
mcp MCP server (stdio): inspect_architecture, memory_to_run, render_architecture_figure, diff_architectures as agent tools

Every command accepts a Hugging Face id (org/name), a local config.json path, a .gguf file or URL, or ollama:<name>. Gated repos (meta-llama, google) need --token or hf auth login.

DeepSeek-V3 vs Llama-3-8B

What it reads

Signal Source fields
MHA / GQA / MQA num_key_value_heads vs num_attention_heads, multi_query
MLA (DeepSeek-style latent KV) kv_lora_rank, q_lora_rank, decoupled-RoPE head dims
MoE num_experts / n_routed_experts / num_local_experts, five spellings of top-k, shared experts, leading dense layers
Hybrid token mixers layer_types, linear_attn_config, full_attn_idxs, attn_type_list, mamba_* — summarized as e.g. "20 linear-attention (KDA) : 7 full attention layers"
Norm placement pre (default), post (OLMo-2), sandwich (Gemma) — drawn structurally
The rest sliding windows and local:global ratios, QK-norm, RoPE θ / ALiBi / learned, tied embeddings, activation

GGUF sources are read header-only (a few MB, never the weights), including vocab size recovered from the tokenizer array length — so llmviz render ollama:qwen2.5-coder:14b diagrams a 9 GB model in under a second. For remote GGUF URLs only the metadata bytes are fetched via ranged HTTP.

Counting convention: "active" means every parameter touched in a forward pass, including embeddings and the LM head — some vendors report actives excluding the unembedding, so their number may read slightly lower.

explain providers

explain is local-first through LiteLLM:

llmviz explain zai-org/GLM-4.5-Air                                    # local Ollama (auto-picks an installed model)
llmviz explain <m> --llm openai/local --api-base http://localhost:8080/v1   # llama.cpp server
llmviz explain <m> --llm groq/llama-3.3-70b-versatile                 # any hosted LiteLLM provider
export LLMVIZ_LLM="ollama/deepseek-r1:latest"                         # set your default

Reasoning models (DeepSeek-R1, Qwen3) are handled — thinking is stripped, the answer is kept.

MCP

Give any agent architecture facts computed from configs instead of recalled from training data:

{"mcpServers": {"llmviz": {"command": "llmviz", "args": ["mcp"]}}}

Python API

from llmviz.fetch import load_spec
from llmviz.render.block import render_model

spec = load_spec("Qwen/Qwen3-235B-A22B")      # or "ollama:deepseek-r1", "./model.gguf"
spec.total_params, spec.active_params, spec.attention.kind, spec.hybrid_note
svg = render_model(spec)

ArchSpec is a Pydantic model — spec.model_dump_json() gives you the normalized architecture for your own tooling.

Development

git clone https://github.com/h9-tec/llmviz && cd llmviz
python -m venv .venv && .venv/bin/pip install -e ".[dev,png]"
.venv/bin/pytest                              # offline; fixtures are real Hub configs
.venv/bin/ruff check src tests

Tests treat published parameter counts as ground truth — if the per-layer math doesn't reproduce a model's documented size, the parser is wrong. CI runs on 3.11/3.12; a nightly workflow rebuilds the trending gallery and deploys it to a Hugging Face Space; tagging v* publishes to PyPI via trusted publishing.

Acknowledgements

The visual language is a faithful implementation of Sebastian Raschka's LLM Architecture Gallery figures (sebastianraschka.com/llm-architecture-gallery) — colors were sampled from his published figures with admiration. If you want the hand-crafted originals with his commentary, go read Ahead of AI.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llmviz-0.1.0.tar.gz (51.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llmviz-0.1.0-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file llmviz-0.1.0.tar.gz.

File metadata

  • Download URL: llmviz-0.1.0.tar.gz
  • Upload date:
  • Size: 51.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmviz-0.1.0.tar.gz
Algorithm Hash digest
SHA256 37352f7e103b0fea5b67098bd410fc43dcf7f0d3a828e7f1a600205d70bb7202
MD5 73121c35a1130c3ce6a495c0c92feaae
BLAKE2b-256 e584441fb41a981a31cef2a33fdc7900ebde98b0eb72798b8841eb90abbbdaef

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmviz-0.1.0.tar.gz:

Publisher: release.yml on h9-tec/llmviz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file llmviz-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: llmviz-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for llmviz-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a0df5b6cfd90263ce7dadc49fb7c2e6074fd05568beabd8a5fe10d31d0c9a01
MD5 eff77e153ff84f184740e94877512670
BLAKE2b-256 529d3f1b86eda0f42b2dbdc286c16f72bb2ebc4b29e414ff13a12dcd65b72da4

See more details on using hashes here.

Provenance

The following attestation bundles were made for llmviz-0.1.0-py3-none-any.whl:

Publisher: release.yml on h9-tec/llmviz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page