Single Model Embedding & Reranker API with Apple Silicon acceleration
Project description
🔥 Embeddings + Reranking on your Mac (MLX‑first)
Blazing‑fast local embeddings and true cross‑encoder reranking on Apple Silicon. Works with Native, OpenAI, TEI, and Cohere APIs.
This page is a beginner‑friendly quick start. Detailed guides live in docs/.
🚀 Start here (60 seconds)
- Install and run (embeddings only)
pip install embed-rerank
# Minimal .env
cat > .env <<'ENV'
BACKEND=auto
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ
PORT=9000
HOST=0.0.0.0
ENV
embed-rerank # http://localhost:9000
Want 2560‑D vectors by default? Add this to .env and restart:
cat >> .env <<'ENV'
# Use the model hidden_size (2560 for Qwen3-Embedding-4B) as output dimension
DIMENSION_STRATEGY=hidden_size
# Or enforce a fixed size (pads/truncates as needed):
# OUTPUT_EMBEDDING_DIMENSION=2560
# DIMENSION_STRATEGY=pad_or_truncate
ENV
# Verify
curl -s http://localhost:9000/api/v1/embed/ \
-H 'Content-Type: application/json' \
-d '{"texts":["hello"],"normalize":true}' | jq '.vectors[0] | length'
- Try it (embeddings + simple rerank)
# Embeddings (Native)
curl -s http://localhost:9000/api/v1/embed/ \
-H 'Content-Type: application/json' \
-d '{"texts":["Hello MLX","Apple Silicon rocks"]}' | jq '.embeddings | length'
# Rerank fallback (no dedicated reranker yet)
curl -s http://localhost:9000/api/v1/rerank/ \
-H 'Content-Type: application/json' \
-d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'
- Add a dedicated reranker (better quality)
cat >> .env <<'ENV'
RERANKER_BACKEND=auto
RERANKER_MODEL_ID=cross-encoder/ms-marco-MiniLM-L-6-v2 # Torch (stable)
# MLX experimental v1 also available: vserifsaglam/Qwen3-Reranker-4B-4bit-MLX
ENV
# Restart server, then call Native or OpenAI-compatible rerank
curl -s http://localhost:9000/api/v1/rerank/ \
-H 'Content-Type: application/json' \
-d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'
- (Optional) Run as a macOS service
# Uses your .env to generate a LaunchAgent and start the service
./tools/setup-macos-service.sh
# Check status and health
launchctl list | grep com.embed-rerank.server
open http://localhost:9000/health/
Notes
- OpenAI drop-in supported for both embeddings and rerank (/v1/embeddings, /v1/rerank). See docs for a tiny SDK example.
- Scores may be auto‑sigmoid‑normalized for OpenAI clients by default (disable via
OPENAI_RERANK_AUTO_SIGMOID=false). - The root endpoint
/shows bothembedding_dimension(served) andhidden_size(model config) for clarity.
Quick endpoints reference
- Native:
/api/v1/embed,/api/v1/rerank - OpenAI:
/v1/embeddings,/v1/openai/rerank(alias:/v1/rerank_openai) - TEI:
/embed,/rerank,/info - Cohere:
/v1/rerank,/v2/rerank
Run the full validation suite
./tools/server-tests.sh --full
🧭 Pick your path
- Deployment profiles (Embeddings‑only, Fallback rerank, Dedicated reranker): docs/DEPLOYMENT_PROFILES.md
- OpenAI usage (tiny example + options): docs/ENHANCED_OPENAI_API.md
- Quality benchmarks (JSONL/CSV judgments): docs/QUALITY_BENCHMARKS.md
- Troubleshooting: docs/TROUBLESHOOTING.md
- Backend specs and performance: docs/BACKEND_TECHNICAL_SPECS.md, docs/PERFORMANCE_COMPARISON_CHARTS.md
Try it with OpenAI SDK (tiny)
import openai
client = openai.OpenAI(base_url="http://localhost:9000/v1", api_key="dummy")
# Embeddings
res = client.embeddings.create(model="text-embedding-ada-002", input=["hello world"])
print(len(res.data[0].embedding))
# Rerank (OpenAI-compatible)
rr = client._request(
"post",
"/v1/openai/rerank",
json={
"query": "capital of france",
"documents": [
{"id": "a", "text": "Paris is the capital of France"},
{"id": "b", "text": "Berlin is in Germany"},
],
"top_n": 2,
},
)
print(rr.get("results", rr))
📄 License
MIT License – build amazing things locally.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file embed_rerank-1.5.0.tar.gz.
File metadata
- Download URL: embed_rerank-1.5.0.tar.gz
- Upload date:
- Size: 146.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
08a7aee321ee0b3509d7da4793d056a0b5fd92ce1844334bcd9b1a49cf8aa91f
|
|
| MD5 |
cc9199be3df7bbb9c0e844d084dd2072
|
|
| BLAKE2b-256 |
3713dfa86fdacf348491ef0ce548a3423aca6acab5238abf8fb362fcf6b8d6e6
|
File details
Details for the file embed_rerank-1.5.0-py3-none-any.whl.
File metadata
- Download URL: embed_rerank-1.5.0-py3-none-any.whl
- Upload date:
- Size: 89.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bb01433f81076b777ec3cdb537a1033a27ad915a2d217ea78eeaafd295d0bce
|
|
| MD5 |
8bf5e2c16161f7cadb5c676ee46e1691
|
|
| BLAKE2b-256 |
57f06e3a24e1120ae16b0c08779a8df6b3593d95a1fb2f747dc809771afc9e7c
|