Skip to main content

Lightweight Qwen3 text embedding & reranking via ONNX Runtime (fork of fastembed)

Project description

qwen3-embed

Lightweight Qwen3 text embedding & reranking via ONNX Runtime. Trimmed fork of fastembed, keeping only Qwen3 models.

Supported Models

Model Type Dims Max Tokens Size
Qwen/Qwen3-Embedding-0.6B Embedding 32-1024 (MRL) 32768 0.57 GB
Qwen/Qwen3-Reranker-0.6B Reranker - 40960 0.57 GB

ONNX weights: n24q02m/Qwen3-Embedding-0.6B-ONNX, n24q02m/Qwen3-Reranker-0.6B-ONNX

Installation

pip install qwen3-embed

Usage

Text Embedding

from qwen3_embed import TextEmbedding

model = TextEmbedding(model_name="Qwen/Qwen3-Embedding-0.6B")

documents = [
    "Qwen3 is a multilingual embedding model.",
    "ONNX Runtime enables fast CPU inference.",
]

embeddings = list(model.embed(documents))
# Each embedding: numpy array of shape (1024,), L2-normalized

# Matryoshka Representation Learning (MRL) -- truncate to smaller dims
embeddings_256 = list(model.embed(documents, dim=256))
# Each embedding: numpy array of shape (256,), L2-normalized

# Query with instruction (for retrieval tasks)
queries = list(model.query_embed(
    ["What is Qwen3?"],
    task="Given a question, retrieve relevant passages",
))

Reranking

from qwen3_embed import TextCrossEncoder

reranker = TextCrossEncoder(model_name="Qwen/Qwen3-Reranker-0.6B")

query = "What is Qwen3?"
documents = [
    "Qwen3 is a series of large language models by Alibaba.",
    "The weather today is sunny.",
    "Qwen3-Embedding supports multilingual text embedding.",
]

scores = list(reranker.rerank(query, documents))
# scores: list of float in [0, 1], higher = more relevant

# Or rerank pairs directly
pairs = [
    ("What is AI?", "Artificial intelligence is a branch of computer science."),
    ("What is ML?", "Machine learning is a subset of AI."),
]
pair_scores = list(reranker.rerank_pairs(pairs))

Key Features

  • Last-token pooling: Uses the final token representation (with left-padding) instead of mean pooling.
  • MRL support: Matryoshka Representation Learning allows truncating embeddings to any dimension from 32 to 1024 while preserving quality.
  • Instruction-aware: Query embedding supports task instructions for better retrieval performance.
  • Causal LM reranking: Reranker uses yes/no logit scoring via causal language model, producing calibrated [0, 1] scores.
  • CPU-only, no PyTorch: Runs on ONNX Runtime -- no GPU or heavy ML framework required.
  • Multilingual: Both models support multi-language inputs.

Development

mise run setup   # Install deps + pre-commit hooks
mise run lint    # ruff check + format --check
mise run test    # pytest
mise run fix     # ruff auto-fix + format

License

Apache-2.0. Original fastembed by Qdrant.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qwen3_embed-0.2.0b0.tar.gz (74.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

qwen3_embed-0.2.0b0-py3-none-any.whl (46.3 kB view details)

Uploaded Python 3

File details

Details for the file qwen3_embed-0.2.0b0.tar.gz.

File metadata

  • Download URL: qwen3_embed-0.2.0b0.tar.gz
  • Upload date:
  • Size: 74.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for qwen3_embed-0.2.0b0.tar.gz
Algorithm Hash digest
SHA256 cb13c35745bf5542eb73c01230471514319777e260c44f16692cabfd52eeb0f0
MD5 96c13c0869fdf9c029c85f6f12f3780b
BLAKE2b-256 3256ebbcd18a7e4a8d05a4b18dcf0e3525c95b74ecdca9d14bcb1bd00262c9b1

See more details on using hashes here.

File details

Details for the file qwen3_embed-0.2.0b0-py3-none-any.whl.

File metadata

  • Download URL: qwen3_embed-0.2.0b0-py3-none-any.whl
  • Upload date:
  • Size: 46.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for qwen3_embed-0.2.0b0-py3-none-any.whl
Algorithm Hash digest
SHA256 2eae7c51c3d85a6f4fb488be90b18d9b209ad6b9a018dfc220047287fb665a06
MD5 24cd0c4f051ba82d7a53c71012c6d4b0
BLAKE2b-256 bd014aeaee6c835b5e12ce5b45c5122aeaec0d5e5d6e8dba9228ba620e7ec4a3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page