Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.

These details have not been verified by PyPI

Project description

SmallModel

Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.

Features

Layer Pruning - Select which transformer layers to keep
Vocab Pruning - Remove unused tokens based on corpus frequency
Hidden Dim Reduction - Shrink internal dimensions (slicing or PCA)
Knowledge Distillation - MSE + Cosine loss alignment with teacher
Auto Compress - Find optimal config within size constraints
2-Stage Distillation - Progressive distillation for 10x+ compression
Interactive Web UI - Visual layer editor with real-time size estimation
MTEB Evaluation - Benchmark on Classification, Clustering, STS tasks

Installation

pip install smallmodel[all]

Or install specific extras:

pip install smallmodel          # core only
pip install smallmodel[web]     # + Flask web UI
pip install smallmodel[eval]    # + MTEB evaluation
pip install smallmodel[export]  # + ONNX export
pip install smallmodel[hub]     # + HuggingFace Hub upload

For development:

git clone https://github.com/gomyk/smallmodel.git
cd smallmodel
pip install -e ".[all]"

Quick Start

Python API

from smallmodel import SmallModel

# Auto-compress within 50MB
sm = SmallModel.from_teacher("gte")
sm.compress(max_fp32_mb=50.0)
sm.distill(epochs=10)

# Manual layer selection
sm = SmallModel.from_teacher("gte", layer_indices=[0, 3, 6, 11])
sm.create()

# Register custom teacher
from smallmodel import register_teacher
register_teacher(
    "my-bert",
    model_id="my-org/my-bert-base",
    short_name="MyBERT",
    hidden_dim=768, num_layers=12,
    intermediate_size=3072, vocab_size=30522,
)

Web UI

from smallmodel import SmallModel

sm = SmallModel.from_teacher("gte")
sm.serve()  # http://127.0.0.1:7860

Or via CLI:

smallmodel serve --teacher gte --port 7860

The web UI lets you:

Select teacher model from 7+ pre-registered models
Toggle layers on/off with preset configurations
Adjust hidden dim, FFN size, and vocab size
See real-time size estimation and compression ratio
Select distillation datasets and evaluation tasks
Analyze vocab coverage at different vocab sizes
Create compressed models with one click

CLI

smallmodel list-teachers
smallmodel compress --teacher gte --max-mb 50
smallmodel create --teacher gte --layers 0,3,6,11
smallmodel distill --teacher gte --student output/students/gte/gte_compressed
smallmodel serve --teacher gte

Pre-registered Teachers

Key	Model	Layers	Hidden	Vocab	FP32 MB
minilm	paraphrase-multilingual-MiniLM-L12-v2	12	384	250K	448
modernbert	ModernBERT-base	22	768	50K	496
gte	gte-multilingual-base	12	768	250K	1058
me5	multilingual-e5-base	12	768	250K	1058
me5s	multilingual-e5-small	12	384	250K	448
gemma_emb	embeddinggemma-300m	24	768	262K	1155
qwen3	Qwen3-0.6B	28	1024	152K	2274

How It Works

Layer Pruning - Copy selected layers from teacher (uniform spacing recommended)
Hidden Dim Reduction - Shrink dimensions if needed to meet size target
Vocab Pruning - Remove tokens not seen in training corpus
Knowledge Distillation - Train student to reproduce teacher's embeddings
Evaluation - MTEB benchmark (Classification, Clustering, STS)

For compression ratios > 10x, a 2-stage distillation pipeline is used: Teacher → Intermediate (~1/5 teacher) → Final Student

License

Apache-2.0

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Mar 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easysmallembeddingmodel-0.1.0.tar.gz (34.9 kB view details)

Uploaded Mar 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

easysmallembeddingmodel-0.1.0-py3-none-any.whl (38.9 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file easysmallembeddingmodel-0.1.0.tar.gz.

File metadata

Download URL: easysmallembeddingmodel-0.1.0.tar.gz
Upload date: Mar 28, 2026
Size: 34.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for easysmallembeddingmodel-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24be3fd5f3b5a92e09769a8c52f83bef7ce320a3fc6373fce3f4b4e1de9d86f6`
MD5	`01e58b00240115f1015093ba39c50930`
BLAKE2b-256	`99f34a107df18fd2bd69288861f4cb7ec4f5fec4013a41481ccd051483b62607`

See more details on using hashes here.

File details

Details for the file easysmallembeddingmodel-0.1.0-py3-none-any.whl.

File metadata

Download URL: easysmallembeddingmodel-0.1.0-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 38.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for easysmallembeddingmodel-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`569f34fa5f2ecbb3a9af3074e0e250a313aa2851ab6b3e6cd722517bdd870dff`
MD5	`98838f5c56e4cb3cdf088dc2748a7228`
BLAKE2b-256	`7b7fd2fb8cb93a9644593995dfd2f859c13fdd6c9c7025ca02f963f6d79ada49`

See more details on using hashes here.

EasySmallEmbeddingModel 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SmallModel

Features

Installation

Quick Start

Python API

Web UI

CLI

Pre-registered Teachers

How It Works

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes