Skip to main content

Delta-Compressed Embedding Engine — compressed approximate similarity search for correlated embeddings

Project description

DCEE — Delta-Compressed Embedding Engine

Compressed approximate similarity search for correlated embedding sequences (e.g. chunks from one document, adjacent logs). Uses k-means routing, delta coding inside clusters, optional Adaptive Margin Probing (AMP) at query time, and optional CuPy for GPU math (falls back to NumPy).

Install

From PyPI (recommended):

pip install dcee

Install a specific release:

pip install "dcee>=0.1.0"

Dependencies (pulled in automatically): numpy, scikit-learn, tqdm. Python 3.10+.

Optional GPU acceleration: install a CuPy wheel that matches your CUDA toolkit (e.g. cupy-cuda12x). If CuPy is not installed, DCEE runs on NumPy (CPU).

Development (editable install from a clone):

git clone <repository-url>
cd DCEE
pip install -e ".[dev]"

Quick start

import numpy as np
from dcee import DCEEConfig, DCEEEngine, is_gpu_available

print("GPU:", is_gpu_available())

emb = np.random.randn(10_000, 128).astype(np.float32)
emb /= np.linalg.norm(emb, axis=1, keepdims=True)

cfg = DCEEConfig.tuned_for(len(emb), emb.shape[1])
engine = DCEEEngine(cfg)
engine.build(emb)

q = emb[0]
for idx, score in engine.search(q, top_k=5):
    print(idx, score)

engine.save("index.dce2")

loaded = DCEEEngine.from_file("index.dce2")
print(loaded.search(q, top_k=3))

Configuration

  • DCEEConfig: defaults for dim, n_clusters, keyframe_every, quantization, n_probe, n_probe_max, AMP (adaptive_probe, adaptive_probe_margin), top_k_refine, verbose.
  • DCEEConfig.tuned_for(n_vectors, dim): heuristic scale-aware defaults.

Set verbose=False for quiet builds and loads.

License

See LICENSE in the repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dcee-1.0.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dcee-1.0.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file dcee-1.0.0.tar.gz.

File metadata

  • Download URL: dcee-1.0.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dcee-1.0.0.tar.gz
Algorithm Hash digest
SHA256 331e950108fe91a42801bca8b413225059d4b82f49c5977d915356406c54c54d
MD5 7653687a88e9589ba371cf9e6ef21fb8
BLAKE2b-256 6b2c7a9eae86143ed1ef32cb9fe319d0f15961a100250dd6ccbf3d3dd48a5c3e

See more details on using hashes here.

File details

Details for the file dcee-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dcee-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for dcee-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b8357803bb8d517fc67cfa62e4c8574bc00182e1fd40fed230fbe32a309ab696
MD5 8c9ff960f490de2ccdfab21d3f7496df
BLAKE2b-256 34e790651de0ba30a91b64df00206e60dc6c30bf2f3e10df7937fec67aa78ef2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page