Skip to main content

A simple toolkit for generating vector embeddings across multiple providers and models

Project description

EmbedKit

A unified interface for text and image embeddings, supporting multiple providers.

Installation

pip install embedkit

Quick Start

from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType, SnowflakeInputType

# Initialize a provider
kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,
)

# Get text embeddings
result = kit.embed_text("Hello world")
print(result.objects[0].embedding.shape)  # 1D array

# Get image embeddings
result = kit.embed_image("path/to/image.png")
print(result.objects[0].embedding.shape)  # 1D array
print(result.objects[0].source_b64)  # Base64 encoded image

Supported Providers

Cohere

kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,  # or EMBED_ENGLISH_V3_0, EMBED_MULTILINGUAL_V3_0, etc.
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,  # or SEARCH_DOCUMENT
)

Snowflake

kit = EmbedKit.snowflake(
    model=Model.Snowflake.ARCTIC_EMBED_L_V2_0,  # or ARCTIC_EMBED_M_V1_5
    text_input_type=SnowflakeInputType.QUERY,  # or DOCUMENT
)

ColPali

kit = EmbedKit.colpali(
    model=Model.ColPali.COLPALI_V1_3,  # or COLSMOL_256M, COLSMOL_500M
)

Jina

kit = EmbedKit.jina(
    model=Model.Jina.CLIP_V2,
    api_key="your-api-key",
)

Response Format

class EmbeddingResponse:
    model_name: str
    model_provider: str
    input_type: str
    objects: List[EmbeddingObject]

class EmbeddingObject:
    embedding: np.ndarray  # 1D array for everything except ColPali
    source_b64: Optional[str]  # Base64 encoded source for images and PDFs

Development

Running Tests

# Run all tests
pytest

# Run tests for specific providers
pytest -m cohere    # Run only Cohere tests
pytest -m colpali   # Run only ColPali tests
pytest -m jina      # Run only Jina tests
pytest -m snowflake # Run only Snowflake tests

# Additional options
pytest -v           # Verbose output
pytest -s           # Show print statements
pytest -x           # Stop on first failure

Requirements

  • Python 3.10+

License

MIT

GitHub

https://github.com/databyjp/embedkit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedkit-0.1.9.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedkit-0.1.9-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file embedkit-0.1.9.tar.gz.

File metadata

  • Download URL: embedkit-0.1.9.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.9.tar.gz
Algorithm Hash digest
SHA256 c191363817dce9e92c163f6d3092d6478719821c46c6e69a72611295703dc161
MD5 521e5e83d1c2b044efb6fd1183d8b155
BLAKE2b-256 de8e532a166be15e21badec4ceeb5d8b23c68b1d99f788e3ca307bd17d2cb55b

See more details on using hashes here.

File details

Details for the file embedkit-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: embedkit-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 463785799dc2f7a949f518b9b86a1fd5183c064f0497f3b3d562c6c5784ca56a
MD5 d43995dbdc518a8a84ef51e7b998eb4f
BLAKE2b-256 2a96bf5e796cd81816f8b644d03efad433b7988cbf75ae01aa965e2a90005ee2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page