Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.
Project description
SmallModel
Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.
Features
- Layer Pruning - Select which transformer layers to keep
- Vocab Pruning - Remove unused tokens based on corpus frequency
- Hidden Dim Reduction - Shrink internal dimensions (slicing or PCA)
- Knowledge Distillation - MSE + Cosine loss alignment with teacher
- Auto Compress - Find optimal config within size constraints
- 2-Stage Distillation - Progressive distillation for 10x+ compression
- Interactive Web UI - Visual layer editor with real-time size estimation
- MTEB Evaluation - Benchmark on Classification, Clustering, STS tasks
Installation
pip install smallmodel[all]
Or install specific extras:
pip install smallmodel # core only
pip install smallmodel[web] # + Flask web UI
pip install smallmodel[eval] # + MTEB evaluation
pip install smallmodel[export] # + ONNX export
pip install smallmodel[hub] # + HuggingFace Hub upload
For development:
git clone https://github.com/gomyk/smallmodel.git
cd smallmodel
pip install -e ".[all]"
Quick Start
Python API
from smallmodel import SmallModel
# Auto-compress within 50MB
sm = SmallModel.from_teacher("gte")
sm.compress(max_fp32_mb=50.0)
sm.distill(epochs=10)
# Manual layer selection
sm = SmallModel.from_teacher("gte", layer_indices=[0, 3, 6, 11])
sm.create()
# Register custom teacher
from smallmodel import register_teacher
register_teacher(
"my-bert",
model_id="my-org/my-bert-base",
short_name="MyBERT",
hidden_dim=768, num_layers=12,
intermediate_size=3072, vocab_size=30522,
)
Web UI
from smallmodel import SmallModel
sm = SmallModel.from_teacher("gte")
sm.serve() # http://127.0.0.1:7860
Or via CLI:
smallmodel serve --teacher gte --port 7860
The web UI lets you:
- Select teacher model from 7+ pre-registered models
- Toggle layers on/off with preset configurations
- Adjust hidden dim, FFN size, and vocab size
- See real-time size estimation and compression ratio
- Select distillation datasets and evaluation tasks
- Analyze vocab coverage at different vocab sizes
- Create compressed models with one click
CLI
smallmodel list-teachers
smallmodel compress --teacher gte --max-mb 50
smallmodel create --teacher gte --layers 0,3,6,11
smallmodel distill --teacher gte --student output/students/gte/gte_compressed
smallmodel serve --teacher gte
Pre-registered Teachers
| Key | Model | Layers | Hidden | Vocab | FP32 MB |
|---|---|---|---|---|---|
| minilm | paraphrase-multilingual-MiniLM-L12-v2 | 12 | 384 | 250K | 448 |
| modernbert | ModernBERT-base | 22 | 768 | 50K | 496 |
| gte | gte-multilingual-base | 12 | 768 | 250K | 1058 |
| me5 | multilingual-e5-base | 12 | 768 | 250K | 1058 |
| me5s | multilingual-e5-small | 12 | 384 | 250K | 448 |
| gemma_emb | embeddinggemma-300m | 24 | 768 | 262K | 1155 |
| qwen3 | Qwen3-0.6B | 28 | 1024 | 152K | 2274 |
How It Works
- Layer Pruning - Copy selected layers from teacher (uniform spacing recommended)
- Hidden Dim Reduction - Shrink dimensions if needed to meet size target
- Vocab Pruning - Remove tokens not seen in training corpus
- Knowledge Distillation - Train student to reproduce teacher's embeddings
- Evaluation - MTEB benchmark (Classification, Clustering, STS)
For compression ratios > 10x, a 2-stage distillation pipeline is used: Teacher → Intermediate (~1/5 teacher) → Final Student
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file easysmallembeddingmodel-0.1.0.tar.gz.
File metadata
- Download URL: easysmallembeddingmodel-0.1.0.tar.gz
- Upload date:
- Size: 34.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24be3fd5f3b5a92e09769a8c52f83bef7ce320a3fc6373fce3f4b4e1de9d86f6
|
|
| MD5 |
01e58b00240115f1015093ba39c50930
|
|
| BLAKE2b-256 |
99f34a107df18fd2bd69288861f4cb7ec4f5fec4013a41481ccd051483b62607
|
File details
Details for the file easysmallembeddingmodel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: easysmallembeddingmodel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 38.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
569f34fa5f2ecbb3a9af3074e0e250a313aa2851ab6b3e6cd722517bdd870dff
|
|
| MD5 |
98838f5c56e4cb3cdf088dc2748a7228
|
|
| BLAKE2b-256 |
7b7fd2fb8cb93a9644593995dfd2f859c13fdd6c9c7025ca02f963f6d79ada49
|