Skip to main content

Single Model Embedding & Reranker API with Apple Silicon acceleration

Project description

🔥 Embeddings + Reranking on your Mac (MLX‑first)

OpenAI rerank supported (/v1/openai/rerank) auto-sigmoid default on PyPI Version

Blazing‑fast local embeddings and true cross‑encoder reranking on Apple Silicon. Works with Native, OpenAI, TEI, and Cohere APIs.

This page is a beginner‑friendly quick start. Detailed guides live in docs/.

🚀 Start here (60 seconds)

  1. Install and run (embeddings only)
pip install embed-rerank

# Minimal .env
cat > .env <<'ENV'
BACKEND=auto
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ
PORT=9000
HOST=0.0.0.0
ENV

embed-rerank  # http://localhost:9000

Want 2560‑D vectors by default? Add this to .env and restart:

cat >> .env <<'ENV'
# Use the model hidden_size (2560 for Qwen3-Embedding-4B) as output dimension
DIMENSION_STRATEGY=hidden_size
# Or enforce a fixed size (pads/truncates as needed):
# OUTPUT_EMBEDDING_DIMENSION=2560
# DIMENSION_STRATEGY=pad_or_truncate
ENV

# Verify
curl -s http://localhost:9000/api/v1/embed/ \
  -H 'Content-Type: application/json' \
  -d '{"texts":["hello"],"normalize":true}' | jq '.vectors[0] | length'
  1. Try it (embeddings + simple rerank)
# Embeddings (Native)
curl -s http://localhost:9000/api/v1/embed/ \
  -H 'Content-Type: application/json' \
  -d '{"texts":["Hello MLX","Apple Silicon rocks"]}' | jq '.embeddings | length'

# Rerank fallback (no dedicated reranker yet)
curl -s http://localhost:9000/api/v1/rerank/ \
  -H 'Content-Type: application/json' \
  -d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'
  1. Add a dedicated reranker (better quality)
cat >> .env <<'ENV'
RERANKER_BACKEND=auto
RERANKER_MODEL_ID=cross-encoder/ms-marco-MiniLM-L-6-v2  # Torch (stable)
# MLX experimental v1 also available: vserifsaglam/Qwen3-Reranker-4B-4bit-MLX
ENV

# Restart server, then call Native or OpenAI-compatible rerank
curl -s http://localhost:9000/api/v1/rerank/ \
  -H 'Content-Type: application/json' \
  -d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'
  1. (Optional) Run as a macOS service
# Uses your .env to generate a LaunchAgent and start the service
./tools/setup-macos-service.sh

# Check status and health
launchctl list | grep com.embed-rerank.server
open http://localhost:9000/health/

Notes

  • OpenAI drop-in supported for both embeddings and rerank (/v1/embeddings, /v1/rerank). See docs for a tiny SDK example.
  • Scores may be auto‑sigmoid‑normalized for OpenAI clients by default (disable via OPENAI_RERANK_AUTO_SIGMOID=false).
  • The root endpoint / shows both embedding_dimension (served) and hidden_size (model config) for clarity.

Quick endpoints reference

  • Native: /api/v1/embed, /api/v1/rerank
  • OpenAI: /v1/embeddings, /v1/openai/rerank (alias: /v1/rerank_openai)
  • TEI: /embed, /rerank, /info
  • Cohere: /v1/rerank, /v2/rerank

Run the full validation suite

./tools/server-tests.sh --full

🧭 Pick your path

  • Deployment profiles (Embeddings‑only, Fallback rerank, Dedicated reranker): docs/DEPLOYMENT_PROFILES.md
  • OpenAI usage (tiny example + options): docs/ENHANCED_OPENAI_API.md
  • Quality benchmarks (JSONL/CSV judgments): docs/QUALITY_BENCHMARKS.md
  • Troubleshooting: docs/TROUBLESHOOTING.md
  • Backend specs and performance: docs/BACKEND_TECHNICAL_SPECS.md, docs/PERFORMANCE_COMPARISON_CHARTS.md

Try it with OpenAI SDK (tiny)

import openai

client = openai.OpenAI(base_url="http://localhost:9000/v1", api_key="dummy")

# Embeddings
res = client.embeddings.create(model="text-embedding-ada-002", input=["hello world"]) 
print(len(res.data[0].embedding))

# Rerank (OpenAI-compatible)
rr = client._request(
  "post",
  "/v1/openai/rerank",
  json={
    "query": "capital of france",
    "documents": [
      {"id": "a", "text": "Paris is the capital of France"},
      {"id": "b", "text": "Berlin is in Germany"},
    ],
    "top_n": 2,
  },
)
print(rr.get("results", rr))

📄 License

MIT License – build amazing things locally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed_rerank-1.5.0.tar.gz (146.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

embed_rerank-1.5.0-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file embed_rerank-1.5.0.tar.gz.

File metadata

  • Download URL: embed_rerank-1.5.0.tar.gz
  • Upload date:
  • Size: 146.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.5.0.tar.gz
Algorithm Hash digest
SHA256 08a7aee321ee0b3509d7da4793d056a0b5fd92ce1844334bcd9b1a49cf8aa91f
MD5 cc9199be3df7bbb9c0e844d084dd2072
BLAKE2b-256 3713dfa86fdacf348491ef0ce548a3423aca6acab5238abf8fb362fcf6b8d6e6

See more details on using hashes here.

File details

Details for the file embed_rerank-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: embed_rerank-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3bb01433f81076b777ec3cdb537a1033a27ad915a2d217ea78eeaafd295d0bce
MD5 8bf5e2c16161f7cadb5c676ee46e1691
BLAKE2b-256 57f06e3a24e1120ae16b0c08779a8df6b3593d95a1fb2f747dc809771afc9e7c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page