Single Model Embedding & Reranker API with Apple Silicon acceleration

These details have not been verified by PyPI

Project links

Project description

🔥 Embeddings + Reranking on your Mac (MLX‑first)

Blazing‑fast local embeddings and true cross‑encoder reranking on Apple Silicon. Works with Native, OpenAI, TEI, and Cohere APIs.

This page is a beginner‑friendly quick start. Detailed guides live in docs/.

🚀 Start here (60 seconds)

Install and run (embeddings only)

pip install embed-rerank

# Minimal .env
cat > .env <<'ENV'
BACKEND=auto
MODEL_NAME=mlx-community/Qwen3-Embedding-4B-4bit-DWQ
PORT=9000
HOST=0.0.0.0
ENV

embed-rerank  # http://localhost:9000

Want 2560‑D vectors by default? Add this to .env and restart:

cat >> .env <<'ENV'
# Use the model hidden_size (2560 for Qwen3-Embedding-4B) as output dimension
DIMENSION_STRATEGY=hidden_size
# Or enforce a fixed size (pads/truncates as needed):
# OUTPUT_EMBEDDING_DIMENSION=2560
# DIMENSION_STRATEGY=pad_or_truncate
ENV

# Verify
curl -s http://localhost:9000/api/v1/embed/ \
  -H 'Content-Type: application/json' \
  -d '{"texts":["hello"],"normalize":true}' | jq '.vectors[0] | length'

Try it (embeddings + simple rerank)

# Embeddings (Native)
curl -s http://localhost:9000/api/v1/embed/ \
  -H 'Content-Type: application/json' \
  -d '{"texts":["Hello MLX","Apple Silicon rocks"]}' | jq '.embeddings | length'

# Rerank fallback (no dedicated reranker yet)
curl -s http://localhost:9000/api/v1/rerank/ \
  -H 'Content-Type: application/json' \
  -d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'

Add a dedicated reranker (better quality)

cat >> .env <<'ENV'
RERANKER_BACKEND=auto
RERANKER_MODEL_ID=cross-encoder/ms-marco-MiniLM-L-6-v2  # Torch (stable)
# MLX experimental v1 also available: vserifsaglam/Qwen3-Reranker-4B-4bit-MLX
ENV

# Restart server, then call Native or OpenAI-compatible rerank
curl -s http://localhost:9000/api/v1/rerank/ \
  -H 'Content-Type: application/json' \
  -d '{"query":"capital of france","documents":["Paris is the capital of France","Berlin is in Germany"],"top_n":2}' | jq '.results[0]'

(Optional) Run as a macOS service

# Uses your .env to generate a LaunchAgent and start the service
./tools/setup-macos-service.sh

# Check status and health
launchctl list | grep com.embed-rerank.server
open http://localhost:9000/health/

Notes

OpenAI drop-in supported for both embeddings and rerank (/v1/embeddings, /v1/rerank). See docs for a tiny SDK example.
Scores may be auto‑sigmoid‑normalized for OpenAI clients by default (disable via OPENAI_RERANK_AUTO_SIGMOID=false).
The root endpoint / shows both embedding_dimension (served) and hidden_size (model config) for clarity.

Quick endpoints reference

Native: /api/v1/embed, /api/v1/rerank
OpenAI: /v1/embeddings, /v1/openai/rerank (alias: /v1/rerank_openai)
TEI: /embed, /rerank, /info
Cohere: /v1/rerank, /v2/rerank

Run the full validation suite

./tools/server-tests.sh --full

🧭 Pick your path

Deployment profiles (Embeddings‑only, Fallback rerank, Dedicated reranker): docs/DEPLOYMENT_PROFILES.md
OpenAI usage (tiny example + options): docs/ENHANCED_OPENAI_API.md
Quality benchmarks (JSONL/CSV judgments): docs/QUALITY_BENCHMARKS.md
Troubleshooting: docs/TROUBLESHOOTING.md
Backend specs and performance: docs/BACKEND_TECHNICAL_SPECS.md, docs/PERFORMANCE_COMPARISON_CHARTS.md

Try it with OpenAI SDK (tiny)

import openai

client = openai.OpenAI(base_url="http://localhost:9000/v1", api_key="dummy")

# Embeddings
res = client.embeddings.create(model="text-embedding-ada-002", input=["hello world"]) 
print(len(res.data[0].embedding))

# Rerank (OpenAI-compatible)
rr = client._request(
  "post",
  "/v1/openai/rerank",
  json={
    "query": "capital of france",
    "documents": [
      {"id": "a", "text": "Paris is the capital of France"},
      {"id": "b", "text": "Berlin is in Germany"},
    ],
    "top_n": 2,
  },
)
print(rr.get("results", rr))

📄 License

MIT License – build amazing things locally.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.1

Nov 14, 2025

This version

1.5.0

Nov 5, 2025

1.3.0

Nov 4, 2025

1.2.3

Oct 30, 2025

1.2.2

Sep 10, 2025

1.2.1

Sep 10, 2025

1.2.0

Sep 9, 2025

1.1.3

Sep 3, 2025

1.1.1

Sep 3, 2025

1.1.0

Aug 28, 2025

1.0.2

Aug 28, 2025

1.0.1

Aug 28, 2025

1.0.0

Aug 28, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

embed_rerank-1.5.0.tar.gz (146.3 kB view details)

Uploaded Nov 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

embed_rerank-1.5.0-py3-none-any.whl (89.4 kB view details)

Uploaded Nov 5, 2025 Python 3

File details

Details for the file embed_rerank-1.5.0.tar.gz.

File metadata

Download URL: embed_rerank-1.5.0.tar.gz
Upload date: Nov 5, 2025
Size: 146.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.5.0.tar.gz
Algorithm	Hash digest
SHA256	`08a7aee321ee0b3509d7da4793d056a0b5fd92ce1844334bcd9b1a49cf8aa91f`
MD5	`cc9199be3df7bbb9c0e844d084dd2072`
BLAKE2b-256	`3713dfa86fdacf348491ef0ce548a3423aca6acab5238abf8fb362fcf6b8d6e6`

See more details on using hashes here.

File details

Details for the file embed_rerank-1.5.0-py3-none-any.whl.

File metadata

Download URL: embed_rerank-1.5.0-py3-none-any.whl
Upload date: Nov 5, 2025
Size: 89.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for embed_rerank-1.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3bb01433f81076b777ec3cdb537a1033a27ad915a2d217ea78eeaafd295d0bce`
MD5	`8bf5e2c16161f7cadb5c676ee46e1691`
BLAKE2b-256	`57f06e3a24e1120ae16b0c08779a8df6b3593d95a1fb2f747dc809771afc9e7c`

See more details on using hashes here.

embed-rerank 1.5.0

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

🔥 Embeddings + Reranking on your Mac (MLX‑first)

🚀 Start here (60 seconds)

🧭 Pick your path

Try it with OpenAI SDK (tiny)

📄 License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes