Skip to main content

Shared embedding model abstraction layer for Digital Duck projects

Project description

dd-embed

Shared embedding model abstraction layer for Digital Duck projects.

Extracted from semanscope and maniscope. Zero heavy deps in core (only numpy). Adapters lazy-import their SDKs only when used.

Install

pip install dd-embed                          # numpy only
pip install "dd-embed[sentence-transformers]" # + sentence-transformers
pip install "dd-embed[openai]"                # + OpenAI SDK (also covers openrouter)
pip install "dd-embed[voyageai]"              # + Voyage AI SDK
pip install "dd-embed[gemini]"                # + Google GenAI SDK
pip install "dd-embed[all]"                   # all provider SDKs

Quick Start

from dd_embed import embed

# Using sentence-transformers (local, free)
embeddings = embed(["hello", "world"], provider="sentence_transformers",
                   model_name="all-MiniLM-L6-v2")
print(embeddings.shape)  # (2, 384)

# Using OpenAI
embeddings = embed(["hello"], provider="openai", api_key="sk-...")

# Using Ollama (local)
embeddings = embed(["hello"], provider="ollama", model_name="bge-m3")

Built-in Adapters

Name Class SDK Notes
sentence_transformers SentenceTransformerAdapter sentence-transformers Local, free, used by maniscope
huggingface HuggingFaceAdapter transformers + torch AutoModel + mean pooling, E5/Qwen support
ollama OllamaEmbedAdapter requests Local Ollama server
openai OpenAIEmbedAdapter openai OpenAI embeddings API
openrouter OpenAIEmbedAdapter (configured) openai OpenAI-compat endpoint
gemini GeminiEmbedAdapter google-generativeai Google Gemini embeddings
voyage VoyageEmbedAdapter voyageai Voyage AI embeddings

Embedding Cache

Disk-persistent, per-word granular cache (ported from semanscope):

from dd_embed import EmbeddingCache, get_adapter

cache = EmbeddingCache()  # default: ~/projects/embedding_cache/dd_embed/master.pkl
adapter = get_adapter("sentence_transformers", model_name="all-MiniLM-L6-v2")

embeddings, cached, computed = cache.get_embeddings(
    texts=["apple", "banana", "cherry"],
    model_name="all-MiniLM-L6-v2",
    scope="en",
    embed_fn=lambda texts: adapter.embed(texts).embeddings,
)
print(f"Cached: {cached}, Computed: {computed}")
cache.save()

Custom Adapters

from dd_embed import EmbeddingAdapter, EmbeddingResult, register_adapter, embed
import numpy as np

class MyAdapter(EmbeddingAdapter):
    def embed(self, texts, **kwargs):
        vecs = np.random.randn(len(texts), 128)  # your logic here
        return EmbeddingResult(
            embeddings=vecs, success=True, provider="my_api",
            model="v1", dimensions=128, num_texts=len(texts),
        )

register_adapter("my_api", MyAdapter)
result = embed(["hello"], provider="my_api")

Environment Variables

Variable Description Default
OPENAI_API_KEY OpenAI API key --
OPENROUTER_API_KEY OpenRouter API key --
GEMINI_API_KEY Google Gemini API key --
VOYAGE_API_KEY Voyage AI API key --
OLLAMA_HOST Ollama server URL http://localhost:11434

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dd_embed-0.1.0.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dd_embed-0.1.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file dd_embed-0.1.0.tar.gz.

File metadata

  • Download URL: dd_embed-0.1.0.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for dd_embed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f39ef28fb5f3226c18746cdf608d9af4b319c669a1d1f117593851781148525
MD5 e888e2fe5892a627355f0fb782e649e7
BLAKE2b-256 265e3fa85b67e5b831c6eec1e340f2ac3741dcda21989547088cd181c6e3b4f4

See more details on using hashes here.

File details

Details for the file dd_embed-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dd_embed-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for dd_embed-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 002bce40cbc0bea2074e728d005a323586c391bf9d0042cb8b809552c41661d6
MD5 6446d1ac71bb812c067459a6d98a35a9
BLAKE2b-256 4579aedf343e0f727fda4def760e15e1c84677699993d7340528c39f518df594

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page