Skip to main content

Export HuggingFace ColBERT models to ONNX format for Rust inference

Project description

pylate-onnx-export

A CLI tool to export HuggingFace ColBERT models to ONNX format for fast Rust inference with Next-Plaid.

Installation

pip install "pylate-onnx-export @ git+https://github.com/lightonai/next-plaid.git#subdirectory=next-plaid-onnx/python"

Quick Start

Export a Model

# Export a ColBERT model to ONNX format
pylate-onnx-export lightonai/GTE-ModernColBERT-v1

# Export with INT8 quantization (recommended for production)
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 --quantize

# Export to a custom directory
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 -o ./my-models --quantize

Push to HuggingFace Hub

# Export and push to HuggingFace Hub
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 --quantize --push-to-hub myorg/my-onnx-model

# Push as a private repository
pylate-onnx-export lightonai/GTE-ModernColBERT-v1 -q --push-to-hub myorg/my-onnx-model --private

Quantize an Existing Model

colbert-quantize ./models/GTE-ModernColBERT-v1

CLI Options

pylate-onnx-export

Usage: pylate-onnx-export [OPTIONS] MODEL_NAME

Arguments:
  MODEL_NAME  HuggingFace model name or local path

Options:
  -o, --output-dir DIR     Output directory (default: ./models)
  -q, --quantize           Create INT8 quantized model
  --push-to-hub REPO_ID    Push to HuggingFace Hub
  --private                Make Hub repository private
  --help                   Show help message

colbert-quantize

Usage: colbert-quantize [OPTIONS] MODEL_DIR

Arguments:
  MODEL_DIR  Directory containing model.onnx

Options:
  --help     Show help message

Output Structure

The tool creates a directory with the following files:

models/<model-name>/
├── model.onnx                        # FP32 ONNX model
├── model_int8.onnx                   # INT8 quantized (with --quantize)
├── tokenizer.json                    # Tokenizer configuration
└── config_sentence_transformers.json # Model metadata (embedding_dim, etc.)

Supported Models

Any PyLate-compatible ColBERT model from HuggingFace can be exported:

Model Embedding Dim Description
lightonai/GTE-ModernColBERT-v1 128 High-quality ColBERT model (recommended)

Python API

from colbert_export import export_model, quantize_model

# Export a model
output_dir = export_model(
    model_name="lightonai/GTE-ModernColBERT-v1",
    output_dir="./models",
    quantize=True,
)

# Or quantize an existing model
quantize_model("./models/GTE-ModernColBERT-v1")

Usage with Next-Plaid API

After exporting a model, you can use it with the Next-Plaid API:

# Start API with the exported model
docker run -d \
  -p 8080:8080 \
  -v ~/.local/share/next-plaid:/data/indices \
  -v ./models:/models:ro \
  ghcr.io/lightonai/next-plaid-api:model \
  --model /models/GTE-ModernColBERT-v1

Or use a model from HuggingFace Hub (auto-downloaded):

docker run -d \
  -p 8080:8080 \
  -v ~/.local/share/next-plaid:/data/indices \
  -v next-plaid-models:/models \
  ghcr.io/lightonai/next-plaid-api:model \
  --model lightonai/GTE-ModernColBERT-v1-onnx

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylate_onnx_export-0.1.0.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylate_onnx_export-0.1.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file pylate_onnx_export-0.1.0.tar.gz.

File metadata

  • Download URL: pylate_onnx_export-0.1.0.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pylate_onnx_export-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7ee29fda700bdf40fc7c95fea0840b509e8e6c689adf31a16eb391f599eba479
MD5 1b1faee20a4615564a2f991ec259b611
BLAKE2b-256 a30020c11ad7b3a0ef60f69937a5f360b6492feedd442bab78e696ae699f88cc

See more details on using hashes here.

File details

Details for the file pylate_onnx_export-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pylate_onnx_export-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 36d02b4470d719df43afc0ea31baf28cc2e2935d4844b693555820940a0d0dc7
MD5 53e0a0ea5f15a141882c608efd5a95dc
BLAKE2b-256 0745b9b5ea0feab39b67566813c467356360817bc086419cc4ffbd70b2f606ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page