A simple toolkit for generating vector embeddings across multiple providers and models
Project description
EmbedKit
A unified interface for text and image embeddings, supporting multiple providers.
Installation
pip install embedkit
Usage
Text Embeddings
from embedkit import EmbedKit
from embedkit.classes import Model, CohereInputType, SnowflakeInputType
# Initialize with ColPali
kit = EmbedKit.colpali(
model=Model.ColPali.COLPALI_V1_3, # or COLSMOL_256M, COLSMOL_500M
text_batch_size=16, # Optional: process text in batches of 16
image_batch_size=8, # Optional: process images in batches of 8
)
# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # Returns 2D array for ColPali
print(result.objects[0].source_b64)
# Initialize with Cohere
kit = EmbedKit.cohere(
model=Model.Cohere.EMBED_V4_0,
api_key="your-api-key",
text_input_type=CohereInputType.SEARCH_QUERY, # or SEARCH_DOCUMENT
text_batch_size=64, # Optional: process text in batches of 64
image_batch_size=8, # Optional: process images in batches of 8
)
# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # Returns 1D array for Cohere
print(result.objects[0].source_b64)
# Initialize with Jina
kit = EmbedKit.jina(
model=Model.Jina.CLIP_V2,
api_key="your-api-key",
text_batch_size=32, # Optional: process text in batches of 32
image_batch_size=8, # Optional: process images in batches of 8
)
# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # Returns 1D array for Jina
print(result.objects[0].source_b64)
# Initialize with Snowflake
kit = EmbedKit.snowflake(
model=Model.Snowflake.ARCTIC_EMBED_L_V2_0, # or ARCTIC_EMBED_M_V1_5
text_input_type=SnowflakeInputType.QUERY, # or DOCUMENT
text_batch_size=32, # Optional: process text in batches of 32
)
# Get embeddings
result = kit.embed_text("Hello world")
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # Returns 1D array for Snowflake
print(result.objects[0].source_b64)
Image Embeddings
from pathlib import Path
# Get embeddings for an image
image_path = Path("path/to/image.png")
result = kit.embed_image(image_path)
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # 2D for ColPali, 1D for Cohere/Jina
print(result.objects[0].source_b64) # Base64 encoded image
PDF Embeddings
from pathlib import Path
# Get embeddings for a PDF
pdf_path = Path("path/to/document.pdf")
result = kit.embed_pdf(pdf_path)
print(result.model_provider)
print(result.input_type)
print(result.objects[0].embedding.shape) # 2D for ColPali, 1D for Cohere/Jina
print(result.objects[0].source_b64) # Base64 encoded PDF page
Response Format
The embedding methods return an EmbeddingResponse object with the following structure:
class EmbeddingResponse:
model_name: str
model_provider: str
input_type: str
objects: List[EmbeddingObject]
class EmbeddingObject:
embedding: np.ndarray # 1D array for everything except ColPali
source_b64: Optional[str] # Base64 encoded source for images and PDFs
Supported Models
ColPali
Model.ColPali.COLPALI_V1_3Model.ColPali.COLSMOL_256MModel.ColPali.COLSMOL_500M
Cohere
Model.Cohere.EMBED_V4_0Model.Cohere.EMBED_ENGLISH_V3_0Model.Cohere.EMBED_ENGLISH_LIGHT_V3_0Model.Cohere.EMBED_MULTILINGUAL_V3_0Model.Cohere.EMBED_MULTILINGUAL_LIGHT_V3_0
Jina
Model.Jina.CLIP_V2
Snowflake
Model.Snowflake.ARCTIC_EMBED_L_V2_0- Large model optimized for high accuracyModel.Snowflake.ARCTIC_EMBED_M_V1_5- Medium model balanced for speed and accuracy
Development
Running Tests
Tests are organized by provider and can be run selectively using pytest markers:
# Run all tests
pytest
# Run tests for specific providers
pytest -m cohere # Run only Cohere tests
pytest -m colpali # Run only ColPali tests
pytest -m jina # Run only Jina tests
pytest -m snowflake # Run only Snowflake tests
# Run tests for multiple providers
pytest -m "cohere or jina"
# Run all tests except a specific provider
pytest -m "not cohere"
# Additional pytest options
pytest -v # Verbose output
pytest -s # Show print statements
pytest -x # Stop on first failure
Requirements
- Python 3.10+
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embedkit-0.1.7.tar.gz.
File metadata
- Download URL: embedkit-0.1.7.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
804e824bb6992a3557e308dd3ab1bccb84fe72bb58c9a3840644cf1d4bf5bfb2
|
|
| MD5 |
ecb08692f60511273d4f16bf68618e3b
|
|
| BLAKE2b-256 |
51093f543e82fc5c26fa24abfc02d50f20e617875135a47608f6a5f4bacff53a
|
File details
Details for the file embedkit-0.1.7-py3-none-any.whl.
File metadata
- Download URL: embedkit-0.1.7-py3-none-any.whl
- Upload date:
- Size: 15.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e0a8cbd237dbe59d33cd1b05870f512a5a2ee69a647f69be01d1f5a6967ccb1
|
|
| MD5 |
ef91a29a78790f3c2a2d8085dc1fc41f
|
|
| BLAKE2b-256 |
d0c8659726d7cc07215cfafcea9627514afcde52cba127505f44da93e55621cb
|