Skip to main content

A simple toolkit for generating vector embeddings across multiple providers and models

Project description

EmbedKit

A unified interface for text and image embeddings, supporting multiple providers.

Installation

pip install embedkit

Usage

Text Embeddings

from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType

# Initialize with ColPali
kit = EmbedKit.colpali(
    model=Model.ColPali.COLPALI_V1_3,  # or COLSMOL_256M, COLSMOL_500M
    text_batch_size=16,  # Optional: process text in batches of 16
    image_batch_size=8,  # Optional: process images in batches of 8
)

# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # Returns 2D array for ColPali
print(result.objects[0].source_b64)

# Initialize with Cohere
kit = EmbedKit.cohere(
    model=Model.Cohere.EMBED_V4_0,
    api_key="your-api-key",
    text_input_type=CohereInputType.SEARCH_QUERY,  # or SEARCH_DOCUMENT
    text_batch_size=64,  # Optional: process text in batches of 64
    image_batch_size=8,  # Optional: process images in batches of 8
)

# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # Returns 1D array for Cohere
print(result.objects[0].source_b64)

Image Embeddings

from pathlib import Path

# Get embeddings for an image
image_path = Path("path/to/image.png")
result = kit.embed_image(image_path)

print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # 2D for ColPali, 1D for Cohere
print(result.objects[0].source_b64)  # Base64 encoded image

PDF Embeddings

from pathlib import Path

# Get embeddings for a PDF
pdf_path = Path("path/to/document.pdf")
result = kit.embed_pdf(pdf_path)

print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape)  # 2D for ColPali, 1D for Cohere
print(result.objects[0].source_b64)  # Base64 encoded PDF page

Response Format

The embedding methods return an EmbeddingResponse object with the following structure:

class EmbeddingResponse:
    model_name: str
    model_provider: str
    input_type: str
    objects: List[EmbeddingObject]

class EmbeddingObject:
    embedding: np.ndarray  # 1D array for Cohere, 2D array for ColPali
    source_b64: Optional[str]  # Base64 encoded source for images and PDFs

Supported Models

ColPali

  • Model.ColPali.COLPALI_V1_3
  • Model.ColPali.COLSMOL_256M
  • Model.ColPali.COLSMOL_500M

Cohere

  • Model.Cohere.EMBED_V4_0
  • Model.Cohere.EMBED_ENGLISH_V3_0
  • Model.Cohere.EMBED_ENGLISH_LIGHT_V3_0
  • Model.Cohere.EMBED_MULTILINGUAL_V3_0
  • Model.Cohere.EMBED_MULTILINGUAL_LIGHT_V3_0

Requirements

  • Python 3.10+

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embedkit-0.1.5.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embedkit-0.1.5-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file embedkit-0.1.5.tar.gz.

File metadata

  • Download URL: embedkit-0.1.5.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.5.tar.gz
Algorithm Hash digest
SHA256 49ad1c19bff4dc4c45e3e60b948dd3b99f6a3b27bf351775a6d0176a7eee2db9
MD5 6fbcaff92cce0e14638403825402e163
BLAKE2b-256 ed38a57cb44b3092828edccf7529d9d8844073a12f242c6ac39129ee64f43d9c

See more details on using hashes here.

File details

Details for the file embedkit-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: embedkit-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for embedkit-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b50a6a089035e61fc04382131de7e48b2f3f60782452d1f8b3b9490f342b8f7f
MD5 59292841ec2ba87c35ac1359607f1aec
BLAKE2b-256 b9f2f75454e67fd0532812d495194c62b05f72c3d1700c464f31ac76ea9f30c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page