Skip to main content

Enhanced static embeddings with a 5-layer matching cascade

Project description

eisenstein-embed

Enhanced static embeddings with a 5-layer matching cascade. Drop-in enhancement of Model2Vec.

What It Does

Replaces Model2Vec's single-path encoding with a progressive cascade:

  1. EXACT — string match (free, catches identical queries)
  2. BITVECTOR — 64-bit Hamming distance (<1μs, 93.8% typo accuracy, zero ML deps)
  3. SEMANTIC — Model2Vec dense vectors (catches paraphrases)
  4. DOMAIN — per-domain vocabulary weighting (SIF from corpus, not Zipf estimate)
  5. DEADBAND — cosine similarity cache (skip 30-60% redundant encoding)

Each layer only runs when previous layers fail → best speed + accuracy combination.

Install

# Core only (bitvector + exact, no ML deps):
pip install eisenstein-embed

# With Model2Vec semantic layer:
pip install eisenstein-embed[model2vec]

Quick Start

from eisenstein_embed import EisensteinModel

# Zero-config — works without any ML library:
model = EisensteinModel()
result = model.match("triangel", ["triangle", "square", "circle"])
# → MatchResult(best_match="triangle", score=0.94, method="bitvector")

# With Model2Vec for semantic matching:
model = EisensteinModel.from_model2vec("minishlab/potion-base-8M")
vectors = model.encode(["hello world", "greetings earth"])

# Domain-aware matching:
model.add_domain("fleet", texts=["deploy micro model", "plato room tile"])

Why Fork Model2Vec?

Model2Vec is brilliant — static embeddings at 500x the speed of transformers. We enhance it with techniques from our PLATO system:

  • TUTOR bitvector matching (1965) catches typos embeddings miss, at 50x less cost
  • Domain-aware SIF learns word importance from your corpus, not Zipf's law
  • Deadband caching skips redundant encoding in conversational systems
  • SplineLinear quantization compresses embedding tables 20x vs Model2Vec's 4x int8
  • BMA drift detection knows when your embeddings have gone stale

Standalone vs PLATO

Standalone: Works without any PLATO infrastructure. Just numpy.

With PLATO: Domain profiles auto-populate from room history. BMA monitors embedding quality. Collective inference compares embeddings across agents. Learn more →

Architecture

eisenstein_embed/
├── cascade.py               # 5-layer cascade matcher
├── bitvector.py             # TUTOR-style 64-bit fingerprints + Hamming distance
├── domain_sif.py            # Corpus-driven SIF weighting
├── deadband_cache.py        # Cosine similarity cache
├── bma_monitor.py           # BMA drift detection
├── eisenstein_quantize.py   # SplineLinear embedding compression
├── static_model.py          # Drop-in StaticModel replacement
└── utils.py                 # Shared utilities

Benchmarks

Method Typo Accuracy Hit Rate Speed Size Dependencies
Exact string match 43.8% 43.8% ~0μs 0KB None
Bitvector (this) 93.8% 86.2% ~21μs ~0KB None
Model2Vec semantic 68.8% 86.2% ~33μs ~400MB model2vec
Eisenstein V3 (ours) 71.2% ~3.4ms 627KB torch
Full cascade 95%+ 90%+ varies varies optional

Benchmarks on 80 fleet-domain queries from 68 SuperInstance repos.

Mesh Protocol

eisenstein-embed works standalone. Install plato-core to enable ecosystem meshing:

pip install eisenstein-embed[mesh]

When co-installed, eisenstein-embed auto-registers:

  • matchers.eisenstein-cascade — 5-layer cascade matcher
  • matchers.eisenstein-bitvector — bitvector fingerprinting
  • encoders.eisenstein — EisensteinModel as encoder

Other packages can discover and use these capabilities automatically.

License

MIT

Credits

Built by SuperInstance / Cocapn Fleet. Bitvector technique inspired by TUTOR (1965, PLATO system, University of Illinois).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eisenstein_embed-0.1.0.tar.gz (22.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

eisenstein_embed-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file eisenstein_embed-0.1.0.tar.gz.

File metadata

  • Download URL: eisenstein_embed-0.1.0.tar.gz
  • Upload date:
  • Size: 22.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for eisenstein_embed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 cf5e563ae86d3a7640fbeb1d732dcb3e10697b3a5cf02c5b4b9633c1a1a99399
MD5 cc2641521ae0d8e06c072dfe255c1052
BLAKE2b-256 786af0ec90be3f56653efcd94aee42ff6198285655e8597edd44bd120ebf1a3a

See more details on using hashes here.

File details

Details for the file eisenstein_embed-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for eisenstein_embed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bada44a558906f49d53733a1f32f6c274b0eeefa6582e1e69a3e693c4da3fea3
MD5 120d8cd3c0403d84b9c1d4df130b63be
BLAKE2b-256 05a109806a7aff12130e84bf36246369db7c328723beafdcd06a6ba6abcd6537

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page