Distill any task into a pocket-sized Spirit. Pure model. No API.
Project description
⚗ Distillarium
Replace narrow LLM pipeline steps with tiny CPU-runnable Spirits. Distill a teacher API into a deployable 20–50M-param model. Same outputs, $0 at inference.
distillarium is a Python toolkit for distilling a teacher LLM (Gemini, Claude, GPT) into a tiny, task-specific model — a Spirit — that runs on CPU, edge, or browser with zero API dependency at inference.
v0.1.1 — alpha. The reference Spirit (Needle, tool calling) is published; the toolkit itself is single-task at the moment with more recipes coming. Pipeline-replacement claims in the docs are based on Needle's measured numbers (78% tool-name accuracy on held-out cuts). See Status below.
Pick this if: you're doing the same narrow LLM call thousands of times a day (intent, routing, NER, classification, tool calling) and want to stop paying frontier-API prices for it. Don't pick this if: you need open-ended generation, multi-step reasoning, or a generalist assistant — keep the frontier model for that.
Why distillation, not fine-tuning
| LoRA / fine-tuning | Distillation (The Distillery) | |
|---|---|---|
| Final size | 7B+ (same as base) | 5M–50M |
| Inference target | GPU | CPU, edge, browser |
| API dependency at inference | Sometimes (for hosted base) | Zero |
| Cost at inference | $$$ per call | $0 |
| Cold start | Seconds | Milliseconds |
| Best for | Capability extension on a generalist | Single-task production |
Install
pip install distillarium[gemini]
Hello, Spirit
Three commands, one Spirit:
# 1. Distill a teacher into a Spirit using a recipe
distillery distill recipes/needle.tool-calling-v1.yaml
# 2. Taste the Spirit — held-out eval against the teacher
distillery taste spirits/needle.pt
# 3. Bottle the Spirit — export to ONNX / GGUF for deployment
distillery bottle spirits/needle.pt --format onnx
A real example — Needle (tool calling, 20.7M params)
Reproduces the reference Spirit at 78° proof on tool-name accuracy (held-out cuts of 100 examples):
# recipes/needle.tool-calling-v1.yaml
name: needle.tool-calling
version: 1
teacher:
provider: gemini
model: gemini-2.5-flash
temperature: 0.9
mash:
total_examples: 1000
examples_per_call: 10
tools_per_call: { min: 3, max: 6 }
student:
arch: attention-only-glu
d_model: 384
n_heads: 6
n_layers: 8
max_seq_len: 256
tokenizer: wordpiece-4096
cuts: { train: 0.9, eval: 0.1 }
still:
epochs: 8
batch_size: 16
lr: 3.0e-4
tasting:
metrics: [tool_name_accuracy, arg_key_f1, exact_call_accuracy]
held_out: 100
Run it:
distillery distill recipes/needle.tool-calling-v1.yaml --out spirits/
Expected result on a single RTX 5090: ~30 minutes, ~$0.30 in Gemini Flash API, 78% tool-name accuracy on held-out, 0.73 arg-key F1, and 3% exact-call accuracy (the value-prediction weak spot we're working on — see Status).
The Distillation Vocabulary
| Term | Means |
|---|---|
| Spirit | The trained, bottled model (your output) |
| Mash | Seed corpus the teacher generates training data from |
| Recipe | YAML config — teacher, mash, student arch, cuts, still, tasting, bottling |
| The Still | The training run |
| Cuts | Train / eval / test data splits |
| Heads / Hearts / Tails | Discarded noise / kept core / borderline cases |
| Proof | Held-out accuracy. The higher the proof, the more concentrated. |
| Tasting Notes | Auto-generated eval report with strengths, weaknesses, failure cases |
| Aging in Casks | Continued training, fine-tuning, RLHF refresh |
| Bottling | Export to ONNX / GGUF / browser-WASM |
| The Cellar | Library of Spirits (public or private) |
Python API
from distillarium import distill, taste, bottle, Recipe
# Load a recipe
recipe = Recipe.from_file("recipes/needle.tool-calling-v1.yaml")
# Distill
spirit = distill(recipe)
# Taste (eval against held-out cuts)
notes = taste(spirit, held_out=100)
print(notes.metrics)
# {'tool_name_accuracy': 0.78, 'arg_key_f1': 0.73, 'exact_call_accuracy': 0.03, ...}
# Bottle (export)
bottle(spirit, format="onnx", out="spirits/needle.onnx")
What's in the box
distillarium.engine— attention-only transformer architecture + tokenizer + trainer + inferencedistillarium.teacher— pluggable teacher backends (Gemini, more coming)distillarium.tasting— held-out evaluation + Tasting Notes generationdistillarium.bottling— exporters (ONNX in v0.1, GGUF in v0.2)distillarium.cli—distillery distill | taste | bottlecommands
How it compares
This sits in a specific gap in the existing distillation ecosystem:
| Project | Sweet spot | Where Distillarium differs |
|---|---|---|
| Arcee DistillKit | General LLM distillation pipelines, 7B-target | We target the 5–50M class, deployment-first (.onnx/.gguf/.wasm as the output, not an afterthought) |
| ModelScope EasyDistill (incl. AgentKD) | Agent distillation, multi-modal | We're CPU/edge-deployment focused, single CLI, Tasting Notes as default eval rigor |
| Berkeley TinyAgent | 1.1B–7B function-calling SLMs | We go smaller (20–50M) at the cost of generality; gain CPU inference and zero vendor lock |
| LoRA fine-tuning | Capability extension on a generalist | Doesn't shrink the model. Distillarium produces a small student model, not a fine-tuned base |
Honest framing: for breadth and SOTA function-calling scores, TinyAgent is the right pick. For replacing one narrow LLM step in a production pipeline with a CPU-runnable artifact you can audit, fork, and ship — that's what Distillarium is for.
We plan to publish BFCL (Berkeley Function Calling Leaderboard) numbers for Needle in v0.2 so the comparison is apples-to-apples.
Status
Current release: v0.1.1 (alpha) — single-task: tool calling via Gemini Flash. The reference Needle Spirit is published; other Spirits listed on the site are roadmap items.
| Version | Scope | Status |
|---|---|---|
| v0.1.1 | Tool-calling Spirits via Gemini · Needle published · pixel-art metaphor UX · ONNX bottling stub | Shipped ✅ |
| v0.2 | Claude + OpenAI teacher backends · byte-level BPE tokenizer (fixes argument-value accuracy) · GGUF export · BFCL benchmark numbers · taste shows teacher-vs-student baseline + version regression |
In progress |
| v0.3 | Classification Spirits · NER Spirits · RAG-routing Spirits · iterative re-distillation on failed cuts | Planned |
| v0.4 | Quantization-aware training · WebAssembly bottling | Planned |
What's NOT solved in v0.1.1 (be honest)
- Argument-value accuracy. Exact-call sits at 3% on Needle. The WordPiece tokenizer splits JSON values awkwardly. v0.2's byte-level BPE is the fix.
- Only Gemini teacher is wired up. Claude and OpenAI providers are stubs.
- No BFCL score yet. v0.2 will publish.
- Tasting Notes are statistical. They don't catch semantic failures like predicting "Twitter" instead of "Instagram." LLM-as-judge eval is on the roadmap.
License
MIT.
Built on top of the Research Radar — Automate Capture's autonomous research-to-product pipeline. The reference Needle Spirit was distilled from a Show HN paper the Radar surfaced in May 2026.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distillarium-0.1.1.tar.gz.
File metadata
- Download URL: distillarium-0.1.1.tar.gz
- Upload date:
- Size: 29.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1c82374cea5a4e25589d9620fe66d852e1c0d09cbd8298a4f7d6d53489950ca
|
|
| MD5 |
7624b7be1e57c965db11920b05bd0773
|
|
| BLAKE2b-256 |
eee83e8b7d35c25403315d57e77127d493474b682ef2d645da476d31d039eab9
|
File details
Details for the file distillarium-0.1.1-py3-none-any.whl.
File metadata
- Download URL: distillarium-0.1.1-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
71cba8c6c29eb68eaf8b3678d3b2cf0ae1e19bbd512b3be8f5a0b59a22de4e5d
|
|
| MD5 |
8caf9676e9ebcb472d158b1c304d8657
|
|
| BLAKE2b-256 |
202bc32ae527aaf51612096a15d3da0f92809523edaf0830af058e1bfeafcb59
|