Skip to main content

Single Model Embedding & Reranker API with Apple Silicon acceleration

Project description

🔥 Single Model Embedding & Reranking API

Lightning-fast local embeddings & reranking for Apple Silicon (MLX-first, OpenAI & TEI compatible)


⚡ Why This Matters

Transform your text processing with 10x faster embeddings and reranking on Apple Silicon. Drop-in replacement for OpenAI API and Hugging Face TEI with zero code changes required.

🏆 Performance Comparison

Operation This API (MLX) OpenAI API Hugging Face TEI
Embeddings 0.78ms 200ms+ 15ms
Reranking 1.04ms N/A 25ms
Model Loading 0.36s N/A 3.2s
Cost $0 $0.02/1K $0

Tested on Apple M4 Max


🚀 Quick Start

Option 1: Install from PyPI (Recommended)

# Install the package
pip install embed-rerank

# Start the server
embed-rerank

Option 2: From Source

# 1. Clone and setup
git clone https://github.com/joonsoo-me/embed-rerank.git
cd embed-rerank
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 2. Start server (macOS/Linux)
./tools/server-run.sh

# 3. Test it works
curl http://localhost:9000/health/

🎉 Done! Visit http://localhost:9000/docs for interactive API documentation.


🛠 Server Management (macOS/Linux)

# Start server (background)
./tools/server-run.sh

# Start server (foreground/development)
./tools/server-run-foreground.sh

# Stop server
./tools/server-stop.sh

Windows Support: Coming soon! Currently optimized for macOS/Linux.


⚙️ Configuration

Create .env file (optional):

# Server
PORT=9000
HOST=0.0.0.0

# Backend
BACKEND=auto                                   # auto | mlx | torch
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ

# Model Cache (first run downloads ~2.3GB model)
MODEL_PATH=                               # Custom model directory
TRANSFORMERS_CACHE=                           # HF cache override
# Default: ~/.cache/huggingface/hub/

# Performance
BATCH_SIZE=32
MAX_TEXTS_PER_REQUEST=100

📂 Model Cache Management

The service automatically manages model downloads and caching:

Environment Variable Purpose Default
MODEL_PATH Custom model directory (uses HF cache)
TRANSFORMERS_CACHE Override HF cache location ~/.cache/huggingface/transformers
HF_HOME HF home directory ~/.cache/huggingface
(auto) Default HF cache ~/.cache/huggingface/hub/

Cache Location Check

# Find where your model is cached
python3 -c "
import os
print('MODEL_PATH:', os.getenv('MODEL_PATH', '<not set>'))
print('TRANSFORMERS_CACHE:', os.getenv('TRANSFORMERS_CACHE', '<not set>'))
print('HF_HOME:', os.getenv('HF_HOME', '<not set>'))
print('Default cache:', os.path.expanduser('~/.cache/huggingface/hub'))
"

# List cached Qwen3 models
ls ~/.cache/huggingface/hub | grep -i qwen3 || echo "No Qwen3 models found in cache"

🌐 Three APIs, One Service

API Endpoint Use Case
Native /api/v1/embed, /api/v1/rerank New projects
OpenAI /v1/embeddings Existing OpenAI code
TEI /embed, /rerank Hugging Face TEI replacement

OpenAI Compatible (Drop-in)

import openai

client = openai.OpenAI(
    api_key="dummy-key",
    base_url="http://localhost:9000/v1"
)

response = client.embeddings.create(
    input=["Hello world", "Apple Silicon is fast!"],
    model="text-embedding-ada-002"
)
# 🚀 10x faster than OpenAI, same code!

TEI Compatible

curl -X POST "http://localhost:9000/embed" 
  -H "Content-Type: application/json" 
  -d '{"inputs": ["Hello world"], "truncate": true}'

Native API

# Embeddings
curl -X POST "http://localhost:9000/api/v1/embed/" 
  -H "Content-Type: application/json" 
  -d '{"texts": ["Apple Silicon", "MLX acceleration"]}'

# Reranking  
curl -X POST "http://localhost:9000/api/v1/rerank/" 
  -H "Content-Type: application/json" 
  -d '{"query": "machine learning", "passages": ["AI is cool", "Dogs are pets", "MLX is fast"]}'

🧪 Testing

# Comprehensive test suite
./tools/server-tests.sh

# Quick health & model loaded info check
curl http://localhost:9000/health/

# Run pytest
pytest tests/ -v

🚀 What You Get

  • Zero Code Changes: Drop-in replacement for OpenAI API and TEI
  • 10x Performance: Apple MLX acceleration on Apple Silicon
  • 💰 Zero Costs: No API fees, runs locally
  • 🔒 Privacy: Your data never leaves your machine
  • 🎯 Three APIs: Native, OpenAI, and TEI compatibility
  • 📊 Production Ready: Health checks, monitoring, structured logging

📄 License

MIT License - build amazing things with this code!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed_rerank-1.0.1.tar.gz (96.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embed_rerank-1.0.1-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file embed_rerank-1.0.1.tar.gz.

File metadata

  • Download URL: embed_rerank-1.0.1.tar.gz
  • Upload date:
  • Size: 96.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.0.1.tar.gz
Algorithm Hash digest
SHA256 a3f4073a9128e2edb4a45517b8c248d3172746104af401fbcb190c99f7f0b4b8
MD5 02ed75d09cb234f3b83c4e46e5f7e5ac
BLAKE2b-256 8bd233bca7cc16badc7a9a3844211b91d956582993599ebda7c74803ed11674a

See more details on using hashes here.

File details

Details for the file embed_rerank-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: embed_rerank-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9d082ebd3ec4f7417ba53c38edf4ac63c21b5d03ef5b2fd2edb24b58b38fc42f
MD5 34cf80d256e59d35a560a7cbd3ee042a
BLAKE2b-256 8e5ec2afea407d74d28f99aa555f75865aa251da2163a3deeebd5f889b852dc1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page