Distill any task into a pocket-sized Spirit. Pure model. No API.

These details have not been verified by PyPI

Project links

Project description

⚗ Distillarium

Replace narrow LLM pipeline steps with tiny CPU-runnable Spirits. Distill a teacher API into a deployable 20–50M-param model. Same outputs, $0 at inference.

distillarium is a Python toolkit for distilling a teacher LLM (Gemini, Claude, GPT) into a tiny, task-specific model — a Spirit — that runs on CPU, edge, or browser with zero API dependency at inference.

v0.1.1 — alpha. The reference Spirit (Needle, tool calling) is published; the toolkit itself is single-task at the moment with more recipes coming. Pipeline-replacement claims in the docs are based on Needle's measured numbers (78% tool-name accuracy on held-out cuts). See Status below.

Pick this if: you're doing the same narrow LLM call thousands of times a day (intent, routing, NER, classification, tool calling) and want to stop paying frontier-API prices for it. Don't pick this if: you need open-ended generation, multi-step reasoning, or a generalist assistant — keep the frontier model for that.

Why distillation, not fine-tuning

	LoRA / fine-tuning	Distillation (The Distillery)
Final size	7B+ (same as base)	5M–50M
Inference target	GPU	CPU, edge, browser
API dependency at inference	Sometimes (for hosted base)	Zero
Cost at inference	$$$ per call	$0
Cold start	Seconds	Milliseconds
Best for	Capability extension on a generalist	Single-task production

Install

pip install distillarium[gemini]

Hello, Spirit

Three commands, one Spirit:

# 1. Distill a teacher into a Spirit using a recipe
distillery distill recipes/needle.tool-calling-v1.yaml

# 2. Taste the Spirit — held-out eval against the teacher
distillery taste spirits/needle.pt

# 3. Bottle the Spirit — export to ONNX / GGUF for deployment
distillery bottle spirits/needle.pt --format onnx

A real example — Needle (tool calling, 20.7M params)

Reproduces the reference Spirit at 78° proof on tool-name accuracy (held-out cuts of 100 examples):

# recipes/needle.tool-calling-v1.yaml
name: needle.tool-calling
version: 1

teacher:
  provider: gemini
  model: gemini-2.5-flash
  temperature: 0.9

mash:
  total_examples: 1000
  examples_per_call: 10
  tools_per_call: { min: 3, max: 6 }

student:
  arch: attention-only-glu
  d_model: 384
  n_heads: 6
  n_layers: 8
  max_seq_len: 256
  tokenizer: wordpiece-4096

cuts: { train: 0.9, eval: 0.1 }

still:
  epochs: 8
  batch_size: 16
  lr: 3.0e-4

tasting:
  metrics: [tool_name_accuracy, arg_key_f1, exact_call_accuracy]
  held_out: 100

Run it:

distillery distill recipes/needle.tool-calling-v1.yaml --out spirits/

Expected result on a single RTX 5090: ~30 minutes, ~$0.30 in Gemini Flash API, 78% tool-name accuracy on held-out, 0.73 arg-key F1, and 3% exact-call accuracy (the value-prediction weak spot we're working on — see Status).

The Distillation Vocabulary

Term	Means
Spirit	The trained, bottled model (your output)
Mash	Seed corpus the teacher generates training data from
Recipe	YAML config — teacher, mash, student arch, cuts, still, tasting, bottling
The Still	The training run
Cuts	Train / eval / test data splits
Heads / Hearts / Tails	Discarded noise / kept core / borderline cases
Proof	Held-out accuracy. The higher the proof, the more concentrated.
Tasting Notes	Auto-generated eval report with strengths, weaknesses, failure cases
Aging in Casks	Continued training, fine-tuning, RLHF refresh
Bottling	Export to ONNX / GGUF / browser-WASM
The Cellar	Library of Spirits (public or private)

Python API

from distillarium import distill, taste, bottle, Recipe

# Load a recipe
recipe = Recipe.from_file("recipes/needle.tool-calling-v1.yaml")

# Distill
spirit = distill(recipe)

# Taste (eval against held-out cuts)
notes = taste(spirit, held_out=100)
print(notes.metrics)
# {'tool_name_accuracy': 0.78, 'arg_key_f1': 0.73, 'exact_call_accuracy': 0.03, ...}

# Bottle (export)
bottle(spirit, format="onnx", out="spirits/needle.onnx")

What's in the box

distillarium.engine — attention-only transformer architecture + tokenizer + trainer + inference
distillarium.teacher — pluggable teacher backends (Gemini, more coming)
distillarium.tasting — held-out evaluation + Tasting Notes generation
distillarium.bottling — exporters (ONNX in v0.1, GGUF in v0.2)
distillarium.cli — distillery distill | taste | bottle commands

How it compares

This sits in a specific gap in the existing distillation ecosystem:

Project	Sweet spot	Where Distillarium differs
Arcee DistillKit	General LLM distillation pipelines, 7B-target	We target the 5–50M class, deployment-first (.onnx/.gguf/.wasm as the output, not an afterthought)
ModelScope EasyDistill (incl. AgentKD)	Agent distillation, multi-modal	We're CPU/edge-deployment focused, single CLI, Tasting Notes as default eval rigor
Berkeley TinyAgent	1.1B–7B function-calling SLMs	We go smaller (20–50M) at the cost of generality; gain CPU inference and zero vendor lock
LoRA fine-tuning	Capability extension on a generalist	Doesn't shrink the model. Distillarium produces a small student model, not a fine-tuned base

Honest framing: for breadth and SOTA function-calling scores, TinyAgent is the right pick. For replacing one narrow LLM step in a production pipeline with a CPU-runnable artifact you can audit, fork, and ship — that's what Distillarium is for.

We plan to publish BFCL (Berkeley Function Calling Leaderboard) numbers for Needle in v0.2 so the comparison is apples-to-apples.

Status

Current release: v0.1.1 (alpha) — single-task: tool calling via Gemini Flash. The reference Needle Spirit is published; other Spirits listed on the site are roadmap items.

Version	Scope	Status
v0.1.1	Tool-calling Spirits via Gemini · Needle published · pixel-art metaphor UX · ONNX bottling stub	Shipped ✅
v0.2	Claude + OpenAI teacher backends · byte-level BPE tokenizer (fixes argument-value accuracy) · GGUF export · BFCL benchmark numbers · `taste` shows teacher-vs-student baseline + version regression	In progress
v0.3	Classification Spirits · NER Spirits · RAG-routing Spirits · iterative re-distillation on failed cuts	Planned
v0.4	Quantization-aware training · WebAssembly bottling	Planned

What's NOT solved in v0.1.1 (be honest)

Argument-value accuracy. Exact-call sits at 3% on Needle. The WordPiece tokenizer splits JSON values awkwardly. v0.2's byte-level BPE is the fix.
Only Gemini teacher is wired up. Claude and OpenAI providers are stubs.
No BFCL score yet. v0.2 will publish.
Tasting Notes are statistical. They don't catch semantic failures like predicting "Twitter" instead of "Instagram." LLM-as-judge eval is on the roadmap.

License

MIT.

Built on top of the Research Radar — Automate Capture's autonomous research-to-product pipeline. The reference Needle Spirit was distilled from a Show HN paper the Radar surfaced in May 2026.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 15, 2026

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

distillarium-0.1.1.tar.gz (29.0 kB view details)

Uploaded May 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

distillarium-0.1.1-py3-none-any.whl (30.8 kB view details)

Uploaded May 15, 2026 Python 3

File details

Details for the file distillarium-0.1.1.tar.gz.

File metadata

Download URL: distillarium-0.1.1.tar.gz
Upload date: May 15, 2026
Size: 29.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for distillarium-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`e1c82374cea5a4e25589d9620fe66d852e1c0d09cbd8298a4f7d6d53489950ca`
MD5	`7624b7be1e57c965db11920b05bd0773`
BLAKE2b-256	`eee83e8b7d35c25403315d57e77127d493474b682ef2d645da476d31d039eab9`

See more details on using hashes here.

File details

Details for the file distillarium-0.1.1-py3-none-any.whl.

File metadata

Download URL: distillarium-0.1.1-py3-none-any.whl
Upload date: May 15, 2026
Size: 30.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for distillarium-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`71cba8c6c29eb68eaf8b3678d3b2cf0ae1e19bbd512b3be8f5a0b59a22de4e5d`
MD5	`8caf9676e9ebcb472d158b1c304d8657`
BLAKE2b-256	`202bc32ae527aaf51612096a15d3da0f92809523edaf0830af058e1bfeafcb59`

See more details on using hashes here.

distillarium 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

⚗ Distillarium

Why distillation, not fine-tuning

Install

Hello, Spirit

A real example — Needle (tool calling, 20.7M params)

The Distillation Vocabulary

Python API

What's in the box

How it compares

Status

What's NOT solved in v0.1.1 (be honest)

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes