Lightweight Qwen3 text embedding & reranking via ONNX Runtime (fork of fastembed)
Project description
qwen3-embed
Lightweight Qwen3 text embedding & reranking via ONNX Runtime. Trimmed fork of fastembed, keeping only Qwen3 models.
Supported Models
ONNX (default)
| Model | Type | Dims | Max Tokens | Size |
|---|---|---|---|---|
Qwen/Qwen3-Embedding-0.6B |
Embedding | 32-1024 (MRL) | 32768 | 573 MB |
Qwen/Qwen3-Embedding-0.6B-Q4F16 |
Embedding | 32-1024 (MRL) | 32768 | 517 MB |
Qwen/Qwen3-Reranker-0.6B |
Reranker | - | 40960 | 573 MB |
Qwen/Qwen3-Reranker-0.6B-Q4F16 |
Reranker | - | 40960 | 518 MB |
GGUF (optional, requires llama-cpp-python)
| Model | Type | Dims | Max Tokens | Size |
|---|---|---|---|---|
Qwen/Qwen3-Embedding-0.6B-GGUF |
Embedding | 32-1024 (MRL) | 32768 | 378 MB |
Qwen/Qwen3-Reranker-0.6B-GGUF |
Reranker | - | 40960 | 378 MB |
HuggingFace Repos
| Format | Embedding | Reranker |
|---|---|---|
| ONNX | n24q02m/Qwen3-Embedding-0.6B-ONNX | n24q02m/Qwen3-Reranker-0.6B-ONNX |
| GGUF | n24q02m/Qwen3-Embedding-0.6B-GGUF | n24q02m/Qwen3-Reranker-0.6B-GGUF |
Installation
pip install qwen3-embed
# For GGUF support
pip install qwen3-embed[gguf]
Usage
Text Embedding
from qwen3_embed import TextEmbedding
# INT8 (default)
model = TextEmbedding(model_name="Qwen/Qwen3-Embedding-0.6B")
# Q4F16 (smaller, slightly less accurate)
model = TextEmbedding(model_name="Qwen/Qwen3-Embedding-0.6B-Q4F16")
# GGUF (requires: pip install qwen3-embed[gguf])
model = TextEmbedding(model_name="Qwen/Qwen3-Embedding-0.6B-GGUF")
documents = [
"Qwen3 is a multilingual embedding model.",
"ONNX Runtime enables fast CPU inference.",
]
embeddings = list(model.embed(documents))
# Each embedding: numpy array of shape (1024,), L2-normalized
# Matryoshka Representation Learning (MRL) -- truncate to smaller dims
embeddings_256 = list(model.embed(documents, dim=256))
# Each embedding: numpy array of shape (256,), L2-normalized
# Query with instruction (for retrieval tasks)
queries = list(model.query_embed(
["What is Qwen3?"],
task="Given a question, retrieve relevant passages",
))
Reranking
from qwen3_embed import TextCrossEncoder
reranker = TextCrossEncoder(model_name="Qwen/Qwen3-Reranker-0.6B")
query = "What is Qwen3?"
documents = [
"Qwen3 is a series of large language models by Alibaba.",
"The weather today is sunny.",
"Qwen3-Embedding supports multilingual text embedding.",
]
scores = list(reranker.rerank(query, documents))
# scores: list of float in [0, 1], higher = more relevant
# Or rerank pairs directly
pairs = [
("What is AI?", "Artificial intelligence is a branch of computer science."),
("What is ML?", "Machine learning is a subset of AI."),
]
pair_scores = list(reranker.rerank_pairs(pairs))
Key Features
- Last-token pooling: Uses the final token representation (with left-padding) instead of mean pooling.
- MRL support: Matryoshka Representation Learning allows truncating embeddings to any dimension from 32 to 1024 while preserving quality.
- Instruction-aware: Query embedding supports task instructions for better retrieval performance.
- Causal LM reranking: Reranker uses yes/no logit scoring via causal language model, producing calibrated [0, 1] scores.
- Multiple backends: ONNX Runtime (INT8, Q4F16) and GGUF (Q4_K_M via llama-cpp-python).
- CPU-only, no PyTorch: Runs on ONNX Runtime -- no GPU or heavy ML framework required.
- Multilingual: Both models support multi-language inputs.
Development
mise run setup # Install deps + pre-commit hooks
mise run lint # ruff check + format --check
mise run test # pytest
mise run fix # ruff auto-fix + format
License
Apache-2.0. Original fastembed by Qdrant.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
qwen3_embed-0.2.1.tar.gz
(82.5 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file qwen3_embed-0.2.1.tar.gz.
File metadata
- Download URL: qwen3_embed-0.2.1.tar.gz
- Upload date:
- Size: 82.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
47e7c7c55131959d5b41a155a3367a95859e7d2913ff2f6d6282852036798553
|
|
| MD5 |
b7f1a57fa7c1481c7cb4dbda73fb85e4
|
|
| BLAKE2b-256 |
d84124f911df25a1a6c4d5491b7aca1a29796426ae76301a106a37d50f67fd30
|
File details
Details for the file qwen3_embed-0.2.1-py3-none-any.whl.
File metadata
- Download URL: qwen3_embed-0.2.1-py3-none-any.whl
- Upload date:
- Size: 52.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc115bdd2eb0186bc9e1deabe56da351fe8fb311a43c6665d8a6028f6dea12a3
|
|
| MD5 |
e7013f309ccf05f1a54d9286232b9ed8
|
|
| BLAKE2b-256 |
a408cdac954516f61518e6bb516c42098854d33c3628d3446a91bfd924f2d71b
|