Skip to main content

A simple toolkit for generating vector embeddings across multiple providers and models

Project description

EmbedKit

A unified interface for text and image embeddings, supporting multiple providers.

Installation

pip install embedkit

Quick Start

from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType, SnowflakeInputType

# Initialize a provider
kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,
    api_key="your-api-key",
)

# Get document embeddings
result = kit.embed_document("Hello world")
print(result.objects[0].embedding.shape)  # 1D array

# Get query embeddings (for providers that support it)
result = kit.embed_query("Hello world")
print(result.objects[0].embedding.shape)  # 1D array

# Get image embeddings
result = kit.embed_image("path/to/image.png")
print(result.objects[0].embedding.shape)  # 1D array
print(result.objects[0].source_b64)  # Base64 encoded image

Supported Providers

Cohere

kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,  # or EMBED_ENGLISH_V3_0, EMBED_MULTILINGUAL_V3_0, etc.
    api_key="your-api-key",
)

# Different embeddings for queries vs documents
query_result = kit.embed_query("What is the capital of France?")
doc_result = kit.embed_document("Paris is the capital of France.")

Snowflake

kit = EmbedKit.snowflake(
    model=Model.Snowflake.ARCTIC_EMBED_L_V2_0,  # or ARCTIC_EMBED_M_V1_5
)

# Different embeddings for queries vs documents
query_result = kit.embed_query("What is the capital of France?")
doc_result = kit.embed_document("Paris is the capital of France.")

Qwen

# Lightweight model (0.6B parameters)
kit = EmbedKit.qwen(
    model=Model.Qwen.QWEN3_EMBEDDING_0_6B,
)

# Larger models (require more memory)
# kit = EmbedKit.qwen(
#     model=Model.Qwen.QWEN3_EMBEDDING_4B,
# )
# kit = EmbedKit.qwen(
#     model=Model.Qwen.QWEN3_EMBEDDING_8B,
# )

# Different embeddings for queries vs documents
query_result = kit.embed_query("What is the capital of France?")
doc_result = kit.embed_document("Paris is the capital of France.")

ColPali

kit = EmbedKit.colpali(
    model=Model.ColPali.COLPALI_V1_3,  # or COLSMOL_256M, COLSMOL_500M
)

# Same embeddings for queries and documents
query_result = kit.embed_query("What is the capital of France?")
doc_result = kit.embed_document("Paris is the capital of France.")
assert np.array_equal(query_result.objects[0].embedding, doc_result.objects[0].embedding)

Jina

kit = EmbedKit.jina(
    model=Model.Jina.CLIP_V2,
    api_key="your-api-key",
)

# Same embeddings for queries and documents
query_result = kit.embed_query("What is the capital of France?")
doc_result = kit.embed_document("Paris is the capital of France.")
assert np.array_equal(query_result.objects[0].embedding, doc_result.objects[0].embedding)

Response Format

class EmbeddingResponse:
    model_name: str
    model_provider: str
    input_type: str  # "text", "search_query", "search_document", "query", "image"
    objects: List[EmbeddingObject]

class EmbeddingObject:
    embedding: np.ndarray  # 1D array for everything except ColPali
    source_b64: Optional[str]  # Base64 encoded source for images and PDFs

Development

Running Tests

# Run all tests
pytest

# Run tests for specific providers
pytest -m cohere    # Run only Cohere tests
pytest -m colpali   # Run only ColPali tests
pytest -m jina      # Run only Jina tests
pytest -m snowflake # Run only Snowflake tests
pytest -m qwen      # Run only Qwen tests

# Additional options
pytest -v           # Verbose output
pytest -s           # Show print statements
pytest -x           # Stop on first failure

Requirements

  • Python 3.10+

License

MIT

GitHub

https://github.com/databyjp/embedkit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedkit-0.1.10.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedkit-0.1.10-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file embedkit-0.1.10.tar.gz.

File metadata

  • Download URL: embedkit-0.1.10.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.10.tar.gz
Algorithm Hash digest
SHA256 5068e65332376cb24c9790bde6b3ebc3781a47deb05d05922e759de35438f4ee
MD5 a681bf5976ab6bcc8a4a05dd49b431c7
BLAKE2b-256 ea9f15fb15a94a2c850dfa3d3df2f49828d39bbf3e786cc2ae2d7b90f43e9261

See more details on using hashes here.

File details

Details for the file embedkit-0.1.10-py3-none-any.whl.

File metadata

  • Download URL: embedkit-0.1.10-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.10-py3-none-any.whl
Algorithm Hash digest
SHA256 6299d6138e0d6eadc2ea33fe117967733359597d1f85d6b785bff577b9201e53
MD5 7e6375ea0a8189905a1cf02de2c91ac1
BLAKE2b-256 0741d453f4ef09422f7e30ee6557f7c33939be72ef1fe7ac335fe40f98c07051

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page