Skip to main content

A simple toolkit for generating vector embeddings across multiple providers and models

Project description

EmbedKit

A unified interface for text and image embeddings, supporting multiple providers.

Installation

pip install embedkit

Quick Start

from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType, SnowflakeInputType

# Initialize a provider
kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,
)

# Get text embeddings
result = kit.embed_text("Hello world")
print(result.objects[0].embedding.shape)  # 1D array

# Get image embeddings
result = kit.embed_image("path/to/image.png")
print(result.objects[0].embedding.shape)  # 1D array
print(result.objects[0].source_b64)  # Base64 encoded image

Supported Providers

Cohere

kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,  # or EMBED_ENGLISH_V3_0, EMBED_MULTILINGUAL_V3_0, etc.
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,  # or SEARCH_DOCUMENT
)

Snowflake

kit = EmbedKit.snowflake(
    model=Model.Snowflake.ARCTIC_EMBED_L_V2_0,  # or ARCTIC_EMBED_M_V1_5
    text_input_type=SnowflakeInputType.QUERY,  # or DOCUMENT
)

ColPali

kit = EmbedKit.colpali(
    model=Model.ColPali.COLPALI_V1_3,  # or COLSMOL_256M, COLSMOL_500M
)

Jina

kit = EmbedKit.jina(
    model=Model.Jina.CLIP_V2,
    api_key="your-api-key",
)

Response Format

class EmbeddingResponse:
    model_name: str
    model_provider: str
    input_type: str
    objects: List[EmbeddingObject]

class EmbeddingObject:
    embedding: np.ndarray  # 1D array for everything except ColPali
    source_b64: Optional[str]  # Base64 encoded source for images and PDFs

Development

Running Tests

# Run all tests
pytest

# Run tests for specific providers
pytest -m cohere    # Run only Cohere tests
pytest -m colpali   # Run only ColPali tests
pytest -m jina      # Run only Jina tests
pytest -m snowflake # Run only Snowflake tests

# Additional options
pytest -v           # Verbose output
pytest -s           # Show print statements
pytest -x           # Stop on first failure

Requirements

  • Python 3.10+

License

MIT

GitHub

https://github.com/databyjp/embedkit

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedkit-0.1.8.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedkit-0.1.8-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file embedkit-0.1.8.tar.gz.

File metadata

  • Download URL: embedkit-0.1.8.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.8.tar.gz
Algorithm Hash digest
SHA256 8660eb44be033b1e59e376deeb5d191ca49b5bc823fa7a5c3fb209c075a7622c
MD5 f4f0d4cea476fe95aa8d08cf4ff62595
BLAKE2b-256 322f40e7268223d6f89e5f24e6f5e960e8c43d2c86953d602a1d65f1531b7b54

See more details on using hashes here.

File details

Details for the file embedkit-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: embedkit-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 31f12efe865c3342cf0e8be2465cce5c8b750502f6a99989e7a914f7562d1c54
MD5 09b2c3504ad6656e1a14dbc2572afe3f
BLAKE2b-256 042467f6242e9fff588f9234cc6b2ee7f110beb66210a0fa198c1ce45ffdf5fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page