Enhanced static embeddings with a 5-layer matching cascade
Project description
eisenstein-embed
Enhanced static embeddings with a 5-layer matching cascade. Drop-in enhancement of Model2Vec.
What It Does
Replaces Model2Vec's single-path encoding with a progressive cascade:
- EXACT — string match (free, catches identical queries)
- BITVECTOR — 64-bit Hamming distance (<1μs, 93.8% typo accuracy, zero ML deps)
- SEMANTIC — Model2Vec dense vectors (catches paraphrases)
- DOMAIN — per-domain vocabulary weighting (SIF from corpus, not Zipf estimate)
- DEADBAND — cosine similarity cache (skip 30-60% redundant encoding)
Each layer only runs when previous layers fail → best speed + accuracy combination.
Install
# Core only (bitvector + exact, no ML deps):
pip install eisenstein-embed
# With Model2Vec semantic layer:
pip install eisenstein-embed[model2vec]
Quick Start
from eisenstein_embed import EisensteinModel
# Zero-config — works without any ML library:
model = EisensteinModel()
result = model.match("triangel", ["triangle", "square", "circle"])
# → MatchResult(best_match="triangle", score=0.94, method="bitvector")
# With Model2Vec for semantic matching:
model = EisensteinModel.from_model2vec("minishlab/potion-base-8M")
vectors = model.encode(["hello world", "greetings earth"])
# Domain-aware matching:
model.add_domain("fleet", texts=["deploy micro model", "plato room tile"])
Why Fork Model2Vec?
Model2Vec is brilliant — static embeddings at 500x the speed of transformers. We enhance it with techniques from our PLATO system:
- TUTOR bitvector matching (1965) catches typos embeddings miss, at 50x less cost
- Domain-aware SIF learns word importance from your corpus, not Zipf's law
- Deadband caching skips redundant encoding in conversational systems
- SplineLinear quantization compresses embedding tables 20x vs Model2Vec's 4x int8
- BMA drift detection knows when your embeddings have gone stale
Standalone vs PLATO
Standalone: Works without any PLATO infrastructure. Just numpy.
With PLATO: Domain profiles auto-populate from room history. BMA monitors embedding quality. Collective inference compares embeddings across agents. Learn more →
Architecture
eisenstein_embed/
├── cascade.py # 5-layer cascade matcher
├── bitvector.py # TUTOR-style 64-bit fingerprints + Hamming distance
├── domain_sif.py # Corpus-driven SIF weighting
├── deadband_cache.py # Cosine similarity cache
├── bma_monitor.py # BMA drift detection
├── eisenstein_quantize.py # SplineLinear embedding compression
├── static_model.py # Drop-in StaticModel replacement
└── utils.py # Shared utilities
Benchmarks
| Method | Typo Accuracy | Hit Rate | Speed | Size | Dependencies |
|---|---|---|---|---|---|
| Exact string match | 43.8% | 43.8% | ~0μs | 0KB | None |
| Bitvector (this) | 93.8% | 86.2% | ~21μs | ~0KB | None |
| Model2Vec semantic | 68.8% | 86.2% | ~33μs | ~400MB | model2vec |
| Eisenstein V3 (ours) | — | 71.2% | ~3.4ms | 627KB | torch |
| Full cascade | 95%+ | 90%+ | varies | varies | optional |
Benchmarks on 80 fleet-domain queries from 68 SuperInstance repos.
Mesh Protocol
eisenstein-embed works standalone. Install plato-core to enable ecosystem meshing:
pip install eisenstein-embed[mesh]
When co-installed, eisenstein-embed auto-registers:
- matchers.eisenstein-cascade — 5-layer cascade matcher
- matchers.eisenstein-bitvector — bitvector fingerprinting
- encoders.eisenstein — EisensteinModel as encoder
Other packages can discover and use these capabilities automatically.
License
MIT
Credits
Built by SuperInstance / Cocapn Fleet. Bitvector technique inspired by TUTOR (1965, PLATO system, University of Illinois).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eisenstein_embed-0.1.0.tar.gz.
File metadata
- Download URL: eisenstein_embed-0.1.0.tar.gz
- Upload date:
- Size: 22.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf5e563ae86d3a7640fbeb1d732dcb3e10697b3a5cf02c5b4b9633c1a1a99399
|
|
| MD5 |
cc2641521ae0d8e06c072dfe255c1052
|
|
| BLAKE2b-256 |
786af0ec90be3f56653efcd94aee42ff6198285655e8597edd44bd120ebf1a3a
|
File details
Details for the file eisenstein_embed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: eisenstein_embed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bada44a558906f49d53733a1f32f6c274b0eeefa6582e1e69a3e693c4da3fea3
|
|
| MD5 |
120d8cd3c0403d84b9c1d4df130b63be
|
|
| BLAKE2b-256 |
05a109806a7aff12130e84bf36246369db7c328723beafdcd06a6ba6abcd6537
|