Distill any task into a pocket-sized Spirit. Pure model. No API.
Project description
⚗ The Distillery
Distill any task into a pocket-sized Spirit. Pure model. No API.
distillery is a Python package that turns a teacher LLM (Gemini, Claude, GPT) into a tiny, deployable, task-specific model — a Spirit — that runs on CPU, edge, or browser with zero API dependency at inference.
Why distillation, not fine-tuning
| LoRA / fine-tuning | Distillation (The Distillery) | |
|---|---|---|
| Final size | 7B+ (same as base) | 5M–50M |
| Inference target | GPU | CPU, edge, browser |
| API dependency at inference | Sometimes (for hosted base) | Zero |
| Cost at inference | $$$ per call | $0 |
| Cold start | Seconds | Milliseconds |
| Best for | Capability extension on a generalist | Single-task production |
Install
pip install distillery[gemini]
Hello, Spirit
Three commands, one Spirit:
# 1. Distill a teacher into a Spirit using a recipe
distillery distill recipes/needle.tool-calling-v1.yaml
# 2. Taste the Spirit — held-out eval against the teacher
distillery taste spirits/needle.pt
# 3. Bottle the Spirit — export to ONNX / GGUF for deployment
distillery bottle spirits/needle.pt --format onnx
A real example — Needle (tool calling, 20.7M params)
Reproduces today's reference Spirit at 67° proof on tool-calling:
# recipes/needle.tool-calling-v1.yaml
name: needle.tool-calling
version: 1
teacher:
provider: gemini
model: gemini-2.5-flash
temperature: 0.9
mash:
total_examples: 1000
examples_per_call: 10
tools_per_call: { min: 3, max: 6 }
student:
arch: attention-only-glu
d_model: 384
n_heads: 6
n_layers: 8
max_seq_len: 256
tokenizer: wordpiece-4096
cuts: { train: 0.9, eval: 0.1 }
still:
epochs: 8
batch_size: 16
lr: 3.0e-4
tasting:
metrics: [tool_name_accuracy, arg_key_f1, exact_call_accuracy]
held_out: 100
Run it:
distillery distill recipes/needle.tool-calling-v1.yaml --out spirits/
Expected result on a single RTX 5090: ~30 minutes, ~$0.30 in Gemini Flash API, 67% tool-name accuracy on held-out.
The Distillation Vocabulary
| Term | Means |
|---|---|
| Spirit | The trained, bottled model (your output) |
| Mash | Seed corpus the teacher generates training data from |
| Recipe | YAML config — teacher, mash, student arch, cuts, still, tasting, bottling |
| The Still | The training run |
| Cuts | Train / eval / test data splits |
| Heads / Hearts / Tails | Discarded noise / kept core / borderline cases |
| Proof | Held-out accuracy. The higher the proof, the more concentrated. |
| Tasting Notes | Auto-generated eval report with strengths, weaknesses, failure cases |
| Aging in Casks | Continued training, fine-tuning, RLHF refresh |
| Bottling | Export to ONNX / GGUF / browser-WASM |
| The Cellar | Library of Spirits (public or private) |
Python API
from distillery import distill, taste, bottle, Recipe
# Load a recipe
recipe = Recipe.from_file("recipes/needle.tool-calling-v1.yaml")
# Distill
spirit = distill(recipe)
# Taste (eval against held-out cuts)
notes = taste(spirit, held_out=100)
print(notes.metrics)
# {'tool_name_accuracy': 0.67, 'arg_key_f1': 0.69, ...}
# Bottle (export)
bottle(spirit, format="onnx", out="spirits/needle.onnx")
What's in the box
distillery.engine— attention-only transformer architecture + tokenizer + trainer + inferencedistillery.teacher— pluggable teacher backends (Gemini, more coming)distillery.tasting— held-out evaluation + Tasting Notes generationdistillery.bottling— exporters (ONNX in v0.1, GGUF in v0.2)distillery.cli—distillery distill | taste | bottlecommands
Status
- ✅ v0.1 — Tool-calling Spirits via Gemini (this release)
- 🚧 v0.2 — Claude teacher, GGUF export, byte-level BPE tokenizer
- 🚧 v0.3 — Classification Spirits, RAG-routing Spirits
- 🚧 v0.4 — Quantization-aware distillation
License
MIT.
Built on top of the Research Radar pipeline. First reference Spirit (Needle) distilled 2026-05-13.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file distillarium-0.1.0.tar.gz.
File metadata
- Download URL: distillarium-0.1.0.tar.gz
- Upload date:
- Size: 25.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
474818af973f4614a29eba489087c3ac31f749d3047fb8263d3dc3b226de8fcf
|
|
| MD5 |
957f5a570853c540b27c11712a716ec8
|
|
| BLAKE2b-256 |
3206ed17019adcc20403a1ab8568b0b69c395f509e47b2faa54cb11087a9bfd9
|
File details
Details for the file distillarium-0.1.0-py3-none-any.whl.
File metadata
- Download URL: distillarium-0.1.0-py3-none-any.whl
- Upload date:
- Size: 29.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea2e04a586545a635da70f99f18b591396c3e1b5fd79d6e252441948b8b48a47
|
|
| MD5 |
b3bce6b4cdfe61c3d0bc515d7373b768
|
|
| BLAKE2b-256 |
f1f0d512b87a60d5b74c424817b34bcc4e1d3c48c9e0a67285ac69fa44ec9d7a
|