Skip to main content

A simple toolkit for generating vector embeddings across multiple providers and models

Project description

EmbedKit

A unified interface for text and image embeddings, supporting multiple providers.

Installation

pip install embedkit

Usage

Text Embeddings

from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType

# Initialize with ColPali
kit = EmbedKit.colpali(
    model=Model.ColPali.COLPALI_V1_3,  # or COLSMOL_256M, COLSMOL_500M
    text_batch_size=16,  # Optional: process text in batches of 16
    image_batch_size=8,  # Optional: process images in batches of 8
)

# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # Returns 2D array for ColPali
print(result.objects[0].source_b64)

# Initialize with Cohere
kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,  # or SEARCH_DOCUMENT
    text_batch_size=64,  # Optional: process text in batches of 64
    image_batch_size=8,  # Optional: process images in batches of 8
)

# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # Returns 1D array for Cohere
print(result.objects[0].source_b64)

Image Embeddings

from pathlib import Path

# Get embeddings for an image
image_path = Path("path/to/image.png")
result = kit.embed_image(image_path)

print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # 2D for ColPali, 1D for Cohere
print(result.objects[0].source_b64)  # Base64 encoded image

PDF Embeddings

from pathlib import Path

# Get embeddings for a PDF
pdf_path = Path("path/to/document.pdf")
result = kit.embed_pdf(pdf_path)

print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # 2D for ColPali, 1D for Cohere
print(result.objects[0].source_b64)  # Base64 encoded PDF page

Response Format

The embedding methods return an EmbeddingResponse object with the following structure:

class EmbeddingResponse:
    model_name: str
    model_provider: str
    input_type: str
    objects: List[EmbeddingObject]

class EmbeddingObject:
    embedding: np.ndarray  # 1D array for Cohere, 2D array for ColPali
    source_b64: Optional[str]  # Base64 encoded source for images and PDFs

Supported Models

ColPali

  • Model.ColPali.COLPALI_V1_3
  • Model.ColPali.COLSMOL_256M
  • Model.ColPali.COLSMOL_500M

Cohere

  • Model.Cohere.EMBED_V4_0
  • Model.Cohere.EMBED_ENGLISH_V3_0
  • Model.Cohere.EMBED_ENGLISH_LIGHT_V3_0
  • Model.Cohere.EMBED_MULTILINGUAL_V3_0
  • Model.Cohere.EMBED_MULTILINGUAL_LIGHT_V3_0

Requirements

  • Python 3.10+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedkit-0.1.6.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedkit-0.1.6-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file embedkit-0.1.6.tar.gz.

File metadata

  • Download URL: embedkit-0.1.6.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.6.tar.gz
Algorithm Hash digest
SHA256 47a80a2d35bb6187af964691967cf5382c42720ad71af839b07cab6f9a8c5e67
MD5 57ac5bd06af7894b0d4b3943d4c137c3
BLAKE2b-256 509bbbc9ca1201d709067a30d9e6ef7bc277a520aa6ab9a4ec374a64a982c69e

See more details on using hashes here.

File details

Details for the file embedkit-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: embedkit-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 338883a47476736a68f83542a07a1e22667a82d48f69424e43dc6cd99c4e113a
MD5 3477dff16cd50462ab8f65f25068f51f
BLAKE2b-256 8e97c963fe1a5add0e16cc9f81aff1b8fa73374984e26c4c37e2c4d1f2d407b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page