Skip to main content

Delta-Compressed Embedding Engine — compressed approximate similarity search for correlated embeddings

Project description

DCEE — Delta-Compressed Embedding Engine

Compressed approximate similarity search for correlated embedding sequences (e.g. chunks from one document, adjacent logs). Uses k-means routing, delta coding inside clusters, optional Adaptive Margin Probing (AMP) at query time, and optional CuPy for GPU math (falls back to NumPy).

Install

From PyPI (recommended):

pip install dcee

Install a specific release:

pip install "dcee>=0.1.0"

Dependencies (pulled in automatically): numpy, scikit-learn, tqdm. Python 3.10+.

Optional GPU acceleration: install a CuPy wheel that matches your CUDA toolkit (e.g. cupy-cuda12x). If CuPy is not installed, DCEE runs on NumPy (CPU).

Development (editable install from a clone):

git clone <repository-url>
cd DCEE
pip install -e ".[dev]"

Quick start

import numpy as np
from dcee import DCEEConfig, DCEEEngine, is_gpu_available

print("GPU:", is_gpu_available())

emb = np.random.randn(10_000, 128).astype(np.float32)
emb /= np.linalg.norm(emb, axis=1, keepdims=True)

cfg = DCEEConfig.tuned_for(len(emb), emb.shape[1])
engine = DCEEEngine(cfg)
engine.build(emb)

q = emb[0]
for idx, score in engine.search(q, top_k=5):
    print(idx, score)

engine.save("index.dce2")

loaded = DCEEEngine.from_file("index.dce2")
print(loaded.search(q, top_k=3))

Configuration

  • DCEEConfig: defaults for dim, n_clusters, keyframe_every, quantization, n_probe, n_probe_max, AMP (adaptive_probe, adaptive_probe_margin), top_k_refine, verbose.
  • DCEEConfig.tuned_for(n_vectors, dim): heuristic scale-aware defaults.

Set verbose=False for quiet builds and loads.

License

See LICENSE in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcee-1.0.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcee-1.0.1-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file dcee-1.0.1.tar.gz.

File metadata

  • Download URL: dcee-1.0.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dcee-1.0.1.tar.gz
Algorithm Hash digest
SHA256 3d05f4eea1a0d261c1f2fd404b6aa029e348f9319523b8aa879fe9963901b628
MD5 6ab3542d83e3b58fb4754442594f5a6c
BLAKE2b-256 edd3d5cf3a878134d4339d9ec0eb777acd05c375ed4596a1caa0c7eff7b71d66

See more details on using hashes here.

File details

Details for the file dcee-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dcee-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dcee-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6c24390ab665cff34b2ef51f6ccf6e203e6e6e72756c6602fa6bba8db8b6b35d
MD5 6e600b5c05cc5696d6d11c1bfa79cb35
BLAKE2b-256 bf4f78ac21f03ce66d41ab9cf9720de273245bdf9bd79bce4cc9576b6b8169f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page