Shared embedding model abstraction layer for Digital Duck projects
Project description
dd-embed
Shared embedding model abstraction layer for Digital Duck projects.
Extracted from semanscope and maniscope. Zero heavy deps in core (only numpy). Adapters lazy-import their SDKs only when used.
Install
pip install dd-embed # numpy only
pip install "dd-embed[sentence-transformers]" # + sentence-transformers
pip install "dd-embed[openai]" # + OpenAI SDK (also covers openrouter)
pip install "dd-embed[voyageai]" # + Voyage AI SDK
pip install "dd-embed[gemini]" # + Google GenAI SDK
pip install "dd-embed[all]" # all provider SDKs
Quick Start
from dd_embed import embed
# Using sentence-transformers (local, free)
embeddings = embed(["hello", "world"], provider="sentence_transformers",
model_name="all-MiniLM-L6-v2")
print(embeddings.shape) # (2, 384)
# Using OpenAI
embeddings = embed(["hello"], provider="openai", api_key="sk-...")
# Using Ollama (local)
embeddings = embed(["hello"], provider="ollama", model_name="bge-m3")
Built-in Adapters
| Name | Class | SDK | Notes |
|---|---|---|---|
sentence_transformers |
SentenceTransformerAdapter |
sentence-transformers |
Local, free, used by maniscope |
huggingface |
HuggingFaceAdapter |
transformers + torch |
AutoModel + mean pooling, E5/Qwen support |
ollama |
OllamaEmbedAdapter |
requests |
Local Ollama server |
openai |
OpenAIEmbedAdapter |
openai |
OpenAI embeddings API |
openrouter |
OpenAIEmbedAdapter (configured) |
openai |
OpenAI-compat endpoint |
gemini |
GeminiEmbedAdapter |
google-generativeai |
Google Gemini embeddings |
voyage |
VoyageEmbedAdapter |
voyageai |
Voyage AI embeddings |
Embedding Cache
Disk-persistent, per-word granular cache (ported from semanscope):
from dd_embed import EmbeddingCache, get_adapter
cache = EmbeddingCache() # default: ~/projects/embedding_cache/dd_embed/master.pkl
adapter = get_adapter("sentence_transformers", model_name="all-MiniLM-L6-v2")
embeddings, cached, computed = cache.get_embeddings(
texts=["apple", "banana", "cherry"],
model_name="all-MiniLM-L6-v2",
scope="en",
embed_fn=lambda texts: adapter.embed(texts).embeddings,
)
print(f"Cached: {cached}, Computed: {computed}")
cache.save()
Custom Adapters
from dd_embed import EmbeddingAdapter, EmbeddingResult, register_adapter, embed
import numpy as np
class MyAdapter(EmbeddingAdapter):
def embed(self, texts, **kwargs):
vecs = np.random.randn(len(texts), 128) # your logic here
return EmbeddingResult(
embeddings=vecs, success=True, provider="my_api",
model="v1", dimensions=128, num_texts=len(texts),
)
register_adapter("my_api", MyAdapter)
result = embed(["hello"], provider="my_api")
Environment Variables
| Variable | Description | Default |
|---|---|---|
OPENAI_API_KEY |
OpenAI API key | -- |
OPENROUTER_API_KEY |
OpenRouter API key | -- |
GEMINI_API_KEY |
Google Gemini API key | -- |
VOYAGE_API_KEY |
Voyage AI API key | -- |
OLLAMA_HOST |
Ollama server URL | http://localhost:11434 |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dd_embed-0.1.0.tar.gz.
File metadata
- Download URL: dd_embed-0.1.0.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f39ef28fb5f3226c18746cdf608d9af4b319c669a1d1f117593851781148525
|
|
| MD5 |
e888e2fe5892a627355f0fb782e649e7
|
|
| BLAKE2b-256 |
265e3fa85b67e5b831c6eec1e340f2ac3741dcda21989547088cd181c6e3b4f4
|
File details
Details for the file dd_embed-0.1.0-py3-none-any.whl.
File metadata
- Download URL: dd_embed-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
002bce40cbc0bea2074e728d005a323586c391bf9d0042cb8b809552c41661d6
|
|
| MD5 |
6446d1ac71bb812c067459a6d98a35a9
|
|
| BLAKE2b-256 |
4579aedf343e0f727fda4def760e15e1c84677699993d7340528c39f518df594
|