Skip to main content

Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.

Project description

SmallModel

Compress large embedding models into small, fast students via layer pruning, vocab pruning, hidden dim reduction, and knowledge distillation.

Features

  • Layer Pruning - Select which transformer layers to keep
  • Vocab Pruning - Remove unused tokens based on corpus frequency
  • Hidden Dim Reduction - Shrink internal dimensions (slicing or PCA)
  • Knowledge Distillation - MSE + Cosine loss alignment with teacher
  • Auto Compress - Find optimal config within size constraints
  • 2-Stage Distillation - Progressive distillation for 10x+ compression
  • Interactive Web UI - Visual layer editor with real-time size estimation
  • MTEB Evaluation - Benchmark on Classification, Clustering, STS tasks

Installation

pip install smallmodel[all]

Or install specific extras:

pip install smallmodel          # core only
pip install smallmodel[web]     # + Flask web UI
pip install smallmodel[eval]    # + MTEB evaluation
pip install smallmodel[export]  # + ONNX export
pip install smallmodel[hub]     # + HuggingFace Hub upload

For development:

git clone https://github.com/gomyk/smallmodel.git
cd smallmodel
pip install -e ".[all]"

Quick Start

Python API

from smallmodel import SmallModel

# Auto-compress within 50MB
sm = SmallModel.from_teacher("gte")
sm.compress(max_fp32_mb=50.0)
sm.distill(epochs=10)

# Manual layer selection
sm = SmallModel.from_teacher("gte", layer_indices=[0, 3, 6, 11])
sm.create()

# Register custom teacher
from smallmodel import register_teacher
register_teacher(
    "my-bert",
    model_id="my-org/my-bert-base",
    short_name="MyBERT",
    hidden_dim=768, num_layers=12,
    intermediate_size=3072, vocab_size=30522,
)

Web UI

from smallmodel import SmallModel

sm = SmallModel.from_teacher("gte")
sm.serve()  # http://127.0.0.1:7860

Or via CLI:

smallmodel serve --teacher gte --port 7860

The web UI lets you:

  • Select teacher model from 7+ pre-registered models
  • Toggle layers on/off with preset configurations
  • Adjust hidden dim, FFN size, and vocab size
  • See real-time size estimation and compression ratio
  • Select distillation datasets and evaluation tasks
  • Analyze vocab coverage at different vocab sizes
  • Create compressed models with one click

CLI

smallmodel list-teachers
smallmodel compress --teacher gte --max-mb 50
smallmodel create --teacher gte --layers 0,3,6,11
smallmodel distill --teacher gte --student output/students/gte/gte_compressed
smallmodel serve --teacher gte

Pre-registered Teachers

Key Model Layers Hidden Vocab FP32 MB
minilm paraphrase-multilingual-MiniLM-L12-v2 12 384 250K 448
modernbert ModernBERT-base 22 768 50K 496
gte gte-multilingual-base 12 768 250K 1058
me5 multilingual-e5-base 12 768 250K 1058
me5s multilingual-e5-small 12 384 250K 448
gemma_emb embeddinggemma-300m 24 768 262K 1155
qwen3 Qwen3-0.6B 28 1024 152K 2274

How It Works

  1. Layer Pruning - Copy selected layers from teacher (uniform spacing recommended)
  2. Hidden Dim Reduction - Shrink dimensions if needed to meet size target
  3. Vocab Pruning - Remove tokens not seen in training corpus
  4. Knowledge Distillation - Train student to reproduce teacher's embeddings
  5. Evaluation - MTEB benchmark (Classification, Clustering, STS)

For compression ratios > 10x, a 2-stage distillation pipeline is used: Teacher → Intermediate (~1/5 teacher) → Final Student

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easysmallembeddingmodel-0.1.0.tar.gz (34.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

easysmallembeddingmodel-0.1.0-py3-none-any.whl (38.9 kB view details)

Uploaded Python 3

File details

Details for the file easysmallembeddingmodel-0.1.0.tar.gz.

File metadata

  • Download URL: easysmallembeddingmodel-0.1.0.tar.gz
  • Upload date:
  • Size: 34.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for easysmallembeddingmodel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 24be3fd5f3b5a92e09769a8c52f83bef7ce320a3fc6373fce3f4b4e1de9d86f6
MD5 01e58b00240115f1015093ba39c50930
BLAKE2b-256 99f34a107df18fd2bd69288861f4cb7ec4f5fec4013a41481ccd051483b62607

See more details on using hashes here.

File details

Details for the file easysmallembeddingmodel-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for easysmallembeddingmodel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 569f34fa5f2ecbb3a9af3074e0e250a313aa2851ab6b3e6cd722517bdd870dff
MD5 98838f5c56e4cb3cdf088dc2748a7228
BLAKE2b-256 7b7fd2fb8cb93a9644593995dfd2f859c13fdd6c9c7025ca02f963f6d79ada49

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page