Export HuggingFace ColBERT models to ONNX format for Rust inference
Project description
pylate-onnx-export
A CLI tool to export HuggingFace ColBERT models to ONNX format for fast Rust inference with Next-Plaid.
Installation
pip install "pylate-onnx-export @ git+https://github.com/lightonai/next-plaid.git#subdirectory=next-plaid-onnx/python"
Quick Start
Export a Model
# Export a ColBERT model to ONNX format
pylate-onnx-export lightonai/GTE-ModernColBERT-v1
# Export with INT8 quantization (recommended for production)
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 --quantize
# Export to a custom directory
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 -o ./my-models --quantize
Push to HuggingFace Hub
# Export and push to HuggingFace Hub
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 --quantize --push-to-hub myorg/my-onnx-model
# Push as a private repository
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 -q --push-to-hub myorg/my-onnx-model --private
Quantize an Existing Model
colbert-quantize ./models/GTE-ModernColBERT-v1
CLI Options
pylate-onnx-export
Usage: pylate-onnx-export [OPTIONS] MODEL_NAME
Arguments:
MODEL_NAME HuggingFace model name or local path
Options:
-o, --output-dir DIR Output directory (default: ./models)
-q, --quantize Create INT8 quantized model
--push-to-hub REPO_ID Push to HuggingFace Hub
--private Make Hub repository private
--help Show help message
colbert-quantize
Usage: colbert-quantize [OPTIONS] MODEL_DIR
Arguments:
MODEL_DIR Directory containing model.onnx
Options:
--help Show help message
Output Structure
The tool creates a directory with the following files:
models/<model-name>/
├── model.onnx # FP32 ONNX model
├── model_int8.onnx # INT8 quantized (with --quantize)
├── tokenizer.json # Tokenizer configuration
└── config_sentence_transformers.json # Model metadata (embedding_dim, etc.)
Supported Models
Any PyLate-compatible ColBERT model from HuggingFace can be exported:
| Model | Embedding Dim | Description |
|---|---|---|
lightonai/GTE-ModernColBERT-v1 |
128 | High-quality ColBERT model (recommended) |
Python API
from colbert_export import export_model, quantize_model
# Export a model
output_dir = export_model(
model_name="lightonai/GTE-ModernColBERT-v1",
output_dir="./models",
quantize=True,
)
# Or quantize an existing model
quantize_model("./models/GTE-ModernColBERT-v1")
Usage with Next-Plaid API
After exporting a model, you can use it with the Next-Plaid API:
# Start API with the exported model
docker run -d \
-p 8080:8080 \
-v ~/.local/share/next-plaid:/data/indices \
-v ./models:/models:ro \
ghcr.io/lightonai/next-plaid-api:model \
--model /models/GTE-ModernColBERT-v1
Or use a model from HuggingFace Hub (auto-downloaded):
docker run -d \
-p 8080:8080 \
-v ~/.local/share/next-plaid:/data/indices \
-v next-plaid-models:/models \
ghcr.io/lightonai/next-plaid-api:model \
--model lightonai/GTE-ModernColBERT-v1-onnx
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pylate_onnx_export-0.1.0.tar.gz.
File metadata
- Download URL: pylate_onnx_export-0.1.0.tar.gz
- Upload date:
- Size: 11.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7ee29fda700bdf40fc7c95fea0840b509e8e6c689adf31a16eb391f599eba479
|
|
| MD5 |
1b1faee20a4615564a2f991ec259b611
|
|
| BLAKE2b-256 |
a30020c11ad7b3a0ef60f69937a5f360b6492feedd442bab78e696ae699f88cc
|
File details
Details for the file pylate_onnx_export-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pylate_onnx_export-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36d02b4470d719df43afc0ea31baf28cc2e2935d4844b693555820940a0d0dc7
|
|
| MD5 |
53e0a0ea5f15a141882c608efd5a95dc
|
|
| BLAKE2b-256 |
0745b9b5ea0feab39b67566813c467356360817bc086419cc4ffbd70b2f606ac
|