Fast and efficient text embeddings using MLX on Apple Silicon
Project description
llamamlx-embeddings
High-performance embeddings with Apple MLX ๐
Version 0.2.0
Overview
llamamlx-embeddings is a Python library that provides high-performance text embeddings using Apple's MLX framework, optimized for Apple Silicon. It offers a unified interface for generating embeddings with various models, efficient batch processing, quantization options, seamless integration with vector databases, and easy deployment as a FastAPI service.
Project Structure
This package follows a standardized structure for ease of use and maintainability:
llamamlx-embeddings/
โโโ src/ # Source code directory
โ โโโ llamamlx_embeddings/ # Main package
โ โโโ api/ # API interfaces and handlers
โ โโโ benchmarks/ # Benchmarking tools
โ โโโ core/ # Core functionality
โ โโโ conversion/ # Model conversion utilities
โ โโโ integrations/ # Vector DB integrations
โ โโโ processing/ # Text processing utilities
โ โโโ quantization/ # Model quantization tools
โ โโโ utils/ # Common utility functions
โ โโโ visualization/ # Visualization utilities
โ โโโ __init__.py # Package initialization
โ โโโ cli.py # Command-line interface
โ โโโ client.py # API client
โ โโโ logging.py # Logging configuration
โ โโโ version.py # Version information
โโโ tests/ # Test directory
โโโ docs/ # Documentation
โโโ examples/ # Example scripts
โโโ benchmarks/ # Benchmark results
โโโ setup.py # Package setup script
โโโ pyproject.toml # Project configuration
โโโ MANIFEST.in # Package manifest
โโโ README.md # Project README
โโโ LICENSE # License information
What's New in v0.2.0
- Fixed import and dependency issues
- Improved package structure and organization
- Added support for the renamed Pinecone package
- Enhanced GitHub Actions workflows for testing and publishing
- Updated build system with modern Python packaging tools
- Added comprehensive test suite
- Improved documentation
โจ Features
- ๐ MLX Optimizations: Leverages Apple Silicon's full potential
- ๐งฉ Multiple Model Types: Dense, sparse, and late interaction models
- ๐ป Cross-Platform: ONNX fallback for non-Apple hardware
- ๐ Vector DB Integration: Easy integration with Qdrant and Pinecone
- ๐ FastAPI Server: Ready-to-use REST API
- ๐ฆ Batch Processing: Efficient handling of large datasets
- ๐ง Quantization: Reduce memory footprint and improve speed
๐ Benchmarks
On Apple M2 Pro, using batch size 32:
| Model | Texts/sec | Dim | Type |
|---|---|---|---|
| BAAI/bge-small-en-v1.5 | ~245 | 384 | Dense |
| sentence-transformers/all-MiniLM-L6-v2 | ~285 | 384 | Dense |
| intfloat/e5-small-v2 | ~230 | 384 | Dense |
| prithivida/Splade_PP_en_v1 | ~80 | var | Sparse |
With INT8 quantization, throughput improves by ~30% and model size reduces by ~69%
๐ ๏ธ Installation
From PyPI
# Basic installation
pip install llamamlx-embeddings
# With vector database integrations
pip install llamamlx-embeddings[qdrant,pinecone]
# Full installation with all features
pip install llamamlx-embeddings[all]
From source
git clone https://github.com/yourusername/llamamlx-embeddings.git
cd llamamlx-embeddings
pip install -e .
๐ Quickstart
Basic Usage
from llamamlx_embeddings import TextEmbedding
import numpy as np
# Create an embedding model (will download if needed)
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Generate embeddings
query = "How to make a delicious pizza?"
query_embedding = model.embed_query(query)
documents = [
"Pizza is a dish of Italian origin consisting of a usually round, flat base of leavened wheat-based dough.",
"To make pizza, you need flour, water, yeast, salt, olive oil, tomato sauce, and cheese."
]
doc_embeddings = model.embed_documents(documents)
# Calculate similarities
for i, doc_emb in enumerate(doc_embeddings):
similarity = np.dot(query_embedding, doc_emb) / (np.linalg.norm(query_embedding) * np.linalg.norm(doc_emb))
print(f"Document {i+1} similarity: {similarity:.4f}")
Mock Embeddings for Testing
from llamamlx_embeddings import MockEmbedding
# Create a mock embedding model
model = MockEmbedding(dimensions=384)
# Use it like a regular embedding model
query_embedding = model.embed_query("How to make pizza?")
document_embeddings = model.embed_documents(["Document 1", "Document 2"])
# Perfect for testing applications without downloading large models
API Server
Start the server:
llamamlx-embeddings serve --host 0.0.0.0 --port 8000
Use the client:
from llamamlx_embeddings import LlamamlxEmbeddingsClient
# Create a client
client = LlamamlxEmbeddingsClient(base_url="http://localhost:8000")
# Generate embeddings
query = "How to make a delicious pizza?"
query_embedding = client.get_embeddings(query, is_query=True)[0]
documents = [
"Pizza is a dish of Italian origin consisting of a usually round, flat base of leavened wheat-based dough.",
"To make pizza, you need flour, water, yeast, salt, olive oil, tomato sauce, and cheese."
]
doc_embeddings = client.get_embeddings(documents)
๐ Documentation
For comprehensive documentation, visit our documentation site.
๐งฉ Supported Models
-
Dense models:
- BAAI/bge-small-en-v1.5 (default)
- intfloat/e5-small-v2
- sentence-transformers/all-MiniLM-L6-v2
- and more...
-
Sparse models:
- prithivida/Splade_PP_en_v1
-
Late interaction models:
- colbert-ir/colbertv2.0
-
Cross-encoder models:
- Xenova/ms-marco-MiniLM-L-6-v2
๐ Vector Database Integration
Qdrant
from llamamlx_embeddings import TextEmbedding, QdrantClient
# Create embedding model
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Initialize Qdrant client
vector_db = QdrantClient(
url="https://your-qdrant-instance.com",
collection_name="my_collection",
embedding_model=model
)
# Add documents
vector_db.add(
documents=["Document 1 text", "Document 2 text"],
metadata=[{"source": "file1.txt"}, {"source": "file2.txt"}]
)
# Search with query
results = vector_db.query("My search query", limit=5)
๐ง Advanced Usage
Quantization
from llamamlx_embeddings import TextEmbedding
# Load a quantized model
model = TextEmbedding(model_name="BAAI/bge-small-en-v1.5", quantize=True)
# Query and document embeddings work the same way
query_embedding = model.embed_query("How to make pizza?")
Custom Models
from llamamlx_embeddings import add_custom_model, TextEmbedding
# Add a custom model
add_custom_model(
model_name="my-custom-model",
model_path="/path/to/model/files",
model_type="dense",
dimensions=768,
description="My custom embedding model"
)
# Use the custom model
model = TextEmbedding(model_name="my-custom-model")
๐ค Contributing
Contributions are welcome! Please check out our contributing guide to get started.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- MLX by Apple
- Transformers by Hugging Face
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llamamlx_embeddings_llamasearch-0.2.0.tar.gz.
File metadata
- Download URL: llamamlx_embeddings_llamasearch-0.2.0.tar.gz
- Upload date:
- Size: 313.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1793a608a5394de1714f34d4a8aab4c5c8fc748e36f1e090a490a55cf65ad7d
|
|
| MD5 |
fef4baff41143645c1c224a41017e2b5
|
|
| BLAKE2b-256 |
542f65354a1fd9304c94a55764356cfa3e3969d1d5b33cba73325cef36417b8a
|
File details
Details for the file llamamlx_embeddings_llamasearch-0.2.0-py3-none-any.whl.
File metadata
- Download URL: llamamlx_embeddings_llamasearch-0.2.0-py3-none-any.whl
- Upload date:
- Size: 83.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17c26eb31bc41096c8ccfa04c248fac0cc7943608d1df75aef3c91d7d7298a8d
|
|
| MD5 |
b6ac8c6be9539a11b7db6c40bea05df7
|
|
| BLAKE2b-256 |
03cb219003599f1086ba9f4f9aa769b00c4ea60d769cfff617011cb134967599
|