Skip to main content

Local-first embedding server: vector generation + index/search over HTTP (ONNX on-device or API providers). The reference /embed server for CPersona.

Project description

CEmbedding

Local-first embedding server

Vector embeddings over a tiny HTTP contract. On-device ONNX or any OpenAI-compatible API. The reference /embed server for CPersona.

License: MIT Python


Standalone repository — extracted from the (now private) clotohub-servers monorepo so it can be used on its own. ClotoCore users get this through the in-app marketplace (ClotoHub); everyone else can run it directly as described below.

What it is

A small server that turns text into vectors. It speaks a minimal HTTP contract so anything can call it — its primary consumer is CPersona, whose hybrid search uses it for the vector-similarity layer. It can run a model on-device via ONNX (no API key, no network) or proxy an OpenAI-compatible API.

It also exposes an MCP (stdio) surface and an optional persistent vector index (/index, /search), but the HTTP /embed endpoint is all CPersona needs.

The /embed contract

POST /embed
Request:  { "texts": ["string", ...] }                 # non-empty array, max 100 per batch
Response: { "embeddings": [[float, ...], ...], "dimensions": <int> }

Point any client (e.g. CPersona's CPERSONA_EMBEDDING_URL / generic EMBEDDING_HTTP_URL) at http://127.0.0.1:8401/embed.

Quick Start (on-device ONNX)

Prerequisites: Python 3.10+

# Download a model into ./data/models (jina-v5-nano is what CPersona is tuned for)
uvx --from "cembedding[onnx]" cembedding-download-model --model jina-v5-nano

# Run the server (reads ./data/models from the current directory)
EMBEDDING_PROVIDER=onnx_jina_v5_nano uvx --from "cembedding[onnx]" cembedding

Or install it onto your PATH with pip install "cembedding[onnx]", then run cembedding-download-model --model jina-v5-nano and cembedding.

From source (development):

git clone https://github.com/Cloto-dev/CEmbedding.git
cd CEmbedding
python -m venv .venv
source .venv/bin/activate          # Windows: .venv\Scripts\activate
pip install ".[onnx]"
python -m cembedding.download_model --model jina-v5-nano
EMBEDDING_PROVIDER=onnx_jina_v5_nano python -m cembedding   # or: python server.py

You should see HTTP embedding endpoint started on http://127.0.0.1:8401/embed. Verify it:

curl -s http://127.0.0.1:8401/embed \
  -H 'content-type: application/json' \
  -d '{"texts":["hello world"]}' | head -c 200

Providers

Set EMBEDDING_PROVIDER:

Value Model Notes
onnx_jina_v5_nano jina-v5-nano (33M, 768d) Local CPU, what CPersona is benchmarked against
onnx_bge_m3 bge-m3 Local CPU, larger / multilingual
onnx_miniml all-MiniLM-L6-v2 (22M, 384d) Local CPU, smallest
mlx_bge_m3 bge-m3 (MLX) Apple Silicon only — pip install ".[mlx]"
api_openai provider's model OpenAI-compatible API; needs EMBEDDING_API_KEY (+ optional EMBEDDING_API_URL, EMBEDDING_MODEL)

Download a local model with cembedding-download-model --model {miniml,jina-v5-nano,bge-m3} (or python -m cembedding.download_model ... from a source checkout; fetched from HuggingFace into ./data/models, not committed to this repo).

Configuration

Env var Default Description
EMBEDDING_PROVIDER api_openai Provider (see table above)
EMBEDDING_HTTP_PORT 8401 HTTP port for /embed
EMBEDDING_INDEX_ENABLED true Enable the persistent vector index endpoints (/index, /search, /remove, /purge)
ONNX_MODEL_DIR (auto) Override the model directory for ONNX providers
ONNX_EP_PREFERENCE (auto) ONNX execution providers, comma-separated. Empty = auto (CoreML on macOS, DirectML on Windows, else CPU; CPU always ensured)
ONNX_MAX_SEQ_LEN 2048 Max tokenization length (1–8192; MiniLM clamped to 512 internally)
EMBEDDING_API_KEY Required for api_openai
EMBEDDING_API_URL https://api.openai.com/v1/embeddings API endpoint for api_openai

Use with CPersona

Run this server, then tell CPersona to use it:

# CPersona MCP config env
CPERSONA_EMBEDDING_MODE=http
CPERSONA_EMBEDDING_URL=http://127.0.0.1:8401/embed

Without an embedding server CPersona still works (FTS5 + keyword search); adding one enables the vector-similarity layer.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cembedding-0.5.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cembedding-0.5.0-py3-none-any.whl (20.8 kB view details)

Uploaded Python 3

File details

Details for the file cembedding-0.5.0.tar.gz.

File metadata

  • Download URL: cembedding-0.5.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cembedding-0.5.0.tar.gz
Algorithm Hash digest
SHA256 c5393dddfbf36d1be9cc5e762477cc68658076c329251e0efcad39ec5836284b
MD5 6584b5a9ba0bc042c0f9b8e2c3542547
BLAKE2b-256 8c31e2b8d0df68013bf4abacfcea0190547cc4a2261907d0f93af26ecbc1c888

See more details on using hashes here.

File details

Details for the file cembedding-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: cembedding-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.18 {"installer":{"name":"uv","version":"0.11.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cembedding-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bef5fcb3efc196d437b7d56026439eb9add6acc4f8f22f85cf49d458c688ef3f
MD5 270a5491815503d59077325431f4d455
BLAKE2b-256 25161187bed6a927fe87976c914031dc4fc93a0b13545e4b5ac2358081f21ef4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page