Flouds Model Exporter: ONNX export and optimization utilities
Project description
flouds_model_exporter
Production-grade ONNX model export toolkit for HuggingFace transformers.
Overview
flouds_model_exporter provides a unified pipeline for converting HuggingFace models to optimized ONNX format:
- Universal Export – Supports embedding models, seq2seq, classification, and large language models (LLMs)
- Smart Optimization – Automatic ONNX optimization with configurable levels and portability modes
- Robust Validation – Numeric verification ensuring export accuracy before deployment
- Large Model Support – External-data format, subprocess isolation, and memory management for multi-GB models
- Batch Orchestration – Python-native batch subcommand with YAML-driven presets for automated multi-model export workflows
- Fallback Strategies – Automatic opset retry, trust_remote_code handling, and error recovery
Quick Start
Installation
From PyPI (recommended)
pip install flouds-model-exporter
From source
# Clone the repository
git clone https://github.com/gmalakar/flouds_model_exporter.git
cd flouds_model_exporter
# Create a Python 3.11 or 3.12 virtual environment
py -3.11 -m venv .venv
.\.venv\Scripts\Activate.ps1
# Install the package and all dependencies
pip install -e .
# (Optional) Install developer tooling
pip install -e ".[dev]"
After installation the CLI entry point is available:
flouds-export export --help
Environment Variables
Use environment variables to control default output location and Hugging Face authentication.
ONNX_PATH
ONNX_PATH sets the default ONNX output root used by export workflows.
Windows PowerShell (current session):
$Env:ONNX_PATH = "C:\path\to\onnx\models"
Linux/macOS (current shell):
export ONNX_PATH="/path/to/onnx/models"
HUGGINGFACE_TOKEN
HUGGINGFACE_TOKEN provides an access token for private/gated Hugging Face model downloads.
Windows PowerShell (current session):
$Env:HUGGINGFACE_TOKEN = "hf_xxx_your_token"
Linux/macOS (current shell):
export HUGGINGFACE_TOKEN="hf_xxx_your_token"
You can also pass a token directly per command with --hf-token.
Persisting Variables
Windows (future terminals):
setx ONNX_PATH "C:\path\to\onnx\models"
setx HUGGINGFACE_TOKEN "hf_xxx_your_token"
Linux/macOS (bash/zsh profile):
echo 'export ONNX_PATH="/path/to/onnx/models"' >> ~/.bashrc
echo 'export HUGGINGFACE_TOKEN="hf_xxx_your_token"' >> ~/.bashrc
Verify Values
Windows PowerShell:
echo $Env:ONNX_PATH
echo $Env:HUGGINGFACE_TOKEN
Linux/macOS:
echo "$ONNX_PATH"
echo "$HUGGINGFACE_TOKEN"
Security note: never commit real tokens to source control. Rotate any exposed token immediately.
Export a Model
Embedding model (Feature Extraction):
flouds-export export `
--model-name sentence-transformers/all-MiniLM-L6-v2 `
--model-for fe `
--task feature-extraction `
--optimize
Seq2seq model (T5, BART):
flouds-export export `
--model-name t5-small `
--model-for s2s `
--task seq2seq-lm `
--optimize
Ranker model (Cross-Encoder):
flouds-export export `
--model-name cross-encoder/ms-marco-MiniLM-L-12-v2 `
--model-for ranker `
--task sequence-classification `
--optimize
Large Language Model (with KV-cache):
flouds-export export `
--model-name deepseek-ai/deepseek-coder-1.3b-instruct `
--model-for llm `
--task text-generation-with-past `
--use-external-data-format `
--use-sub-process `
--use-fallback-if-failed `
--optimize `
--merge
Batch Export
Export all configured models with optimizations:
flouds-export batch --preset recommended --optimize --cleanup --portable
Wrapper script reference: see docs/WRAPPER_SCRIPTS.md for complete parameter documentation for run_exports.ps1 and run_exports.sh.
Windows users can still use .\run_exports.ps1, which forwards to the Python CLI batch subcommand.
Batch presets are loaded from src/model_exporter/config/policy.yaml, and you can point to a custom YAML file with --config.
Linux/macOS users can use ./run_exports.sh with the same batch concepts:
chmod +x ./run_exports.sh
./run_exports.sh --config ./docs/batch_presets_example.yaml --preset text-import --fail-fast
Note: the --suppress-warning wrapper/CLI option has been removed. To control logging behavior use --log-to-file (or -LogToFile for the PowerShell wrapper) which requests per-export log files and tee'ing of stdout/stderr into the logfile. By default the exporter writes logs to file unless overridden.
Batch Examples (YAML and Text File)
YAML preset example file:
docs/batch_presets_example.yaml
Run using YAML preset:
.\run_exports.ps1 -Config .\docs\batch_presets_example.yaml -Preset text-import -FailFast
Text command list example file:
docs/batch_commands.txt
Run using text file import:
.\run_exports.ps1 -TextFile .\docs\batch_commands.txt -Preset text-import -FailFast
Note: text file entries must use the new hyphenated CLI flags (for example --opset-version, not --opset_version).
Validate An Export
Validate an exported ONNX model against its reference Hugging Face model:
flouds-export validate --model-dir onnx/models/fe/all-MiniLM-L6-v2 --reference-model sentence-transformers/all-MiniLM-L6-v2 --normalize-embeddings
Optimize Existing Exported Models
Run the shared optimizer service against an already-exported ONNX directory:
flouds-export optimize --model-dir onnx/models/fe/all-MiniLM-L6-v2 --model-for fe --optimization-level 2 --portable
Python API
After installing the package you can call the exporter directly from Python without using the CLI.
Basic usage
If ONNX_PATH is set, you can omit onnx_path and the exporter will use it automatically:
import os
os.environ["ONNX_PATH"] = "/path/to/onnx/models" # or set it before launching Python
from model_exporter.export.pipeline import export
output_dir = export(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_for="fe",
task="feature-extraction",
optimize=True,
# onnx_path not needed — picked up from ONNX_PATH env var
)
print(f"Exported to: {output_dir}")
Or pass onnx_path explicitly to override the environment variable:
output_dir = export(
model_name="sentence-transformers/all-MiniLM-L6-v2",
model_for="fe",
task="feature-extraction",
onnx_path="./custom/onnx", # overrides ONNX_PATH
optimize=True,
)
print(f"Exported to: {output_dir}")
Seq2seq (T5, BART)
export(
model_name="t5-small",
model_for="s2s",
task="seq2seq-lm",
optimize=True,
)
Large model with subprocess isolation
export(
model_name="meta-llama/Llama-2-7b-hf",
model_for="llm",
task="text-generation-with-past",
use_external_data_format=True,
use_subprocess=True,
use_fallback_if_failed=True,
merge=True,
hf_token="hf_xxx_your_token", # for gated models
)
API reference
| Parameter | Type | Default | Description |
|---|---|---|---|
model_name |
str |
required | HuggingFace model ID or local path |
model_for |
str |
"fe" |
fe, s2s, sc, llm, ranker |
task |
str |
None |
e.g. feature-extraction, seq2seq-lm, sequence-classification |
onnx_path |
str |
"onnx" |
Output directory |
optimize |
bool |
False |
Run ONNX optimizer after export |
optimization_level |
int |
99 |
ORT optimization level (0–99) |
opset_version |
int |
auto | ONNX opset version |
device |
str |
"cpu" |
cpu or cuda |
framework |
str |
None |
pt or tf |
trust_remote_code |
bool |
False |
Allow custom model code |
use_external_data_format |
bool |
False |
Split model for >2GB exports |
use_subprocess |
bool |
None |
Run export in isolated subprocess |
use_fallback_if_failed |
bool |
False |
Enable legacy fallback only if primary export fails |
merge |
bool |
False |
Merge decoder artifacts (LLMs) |
pack_single_file |
bool |
False |
Repack external-data into single file |
normalize_embeddings |
bool |
False |
L2-normalize before validation |
skip_validator |
bool |
False |
Skip numeric validation |
require_validator |
bool |
False |
Fail if validation cannot run |
quantize |
any | False |
Quantization configuration |
hf_token |
str |
None |
HuggingFace auth token (via **kwargs) |
CLI Reference
Core Parameters
| Parameter | Values | Description |
|---|---|---|
--model-name |
str |
HuggingFace model ID or local path |
--model-for |
fe, s2s, sc, ranker, llm |
Model type: embedding, seq2seq, classification, ranker (cross-encoder), or language model |
--task |
str |
Export task: feature-extraction, seq2seq-lm, sequence-classification, text-generation-with-past, etc. |
--framework |
pt, tf |
Framework: PyTorch or TensorFlow |
--device |
cpu, cuda |
Target device |
--opset-version |
11, 14, 17, 18 |
ONNX opset version (default: 17) |
--trust-remote-code |
flag | Allow custom model code execution |
Export Configuration
| Parameter | Default | Description |
|---|---|---|
--framework |
pt |
Framework: pt (PyTorch) or tf (TensorFlow) |
--device |
cpu |
Target device: cpu or cuda |
--opset-version |
17 |
ONNX opset version (11, 14, or 17) |
--trust-remote-code |
false |
Allow custom model code execution (⚠ security risk) |
--force |
false |
Overwrite existing exports |
Optimization & Validation
| Parameter | Description |
|---|---|
--optimize |
Enable post-export ONNX optimization |
--optimization-level |
Optimization level: 0-99 (default: 99) |
--portable |
Use conservative optimizations for cross-platform compatibility |
--skip-validator |
Skip numeric validation |
--require-validator |
Fail build if validation fails |
--normalize-embeddings |
L2-normalize embeddings during validation |
The standalone optimize subcommand accepts --model-dir, --model-for, --optimization-level, and --portable so you can re-run optimization without repeating export.
Large Model Options
| Parameter | Description |
|---|---|
--use-external-data-format |
Split model into .onnx + .onnx_data files (for >2GB models) |
--use-sub-process |
Run export in isolated subprocess (safer for large models) |
--use-fallback-if-failed |
Enable legacy fallback exporter only if primary export fails |
--no-post-process |
Skip ONNX post-processing (reduces memory usage) |
--pack-single-file |
Repack external-data model into single file during validation |
--pack-single-threshold-mb |
Size threshold for repacking (default: 1536 MB) |
Advanced Options
| Parameter | Description |
|---|---|
--merge |
Merge decoder artifacts for LLMs (with-past only) |
--no-local-prep |
Skip local model preparation for LLMs |
--cleanup |
Remove temporary/extraneous files post-export |
--prune-canonical |
Remove canonical models when merged version exists |
--hf-token |
HuggingFace API token for private models |
--onnx-path |
Custom output directory (default: ./onnx) |
Output Structure
Exported models are organized by type and name:
onnx/models/
├── fe/ # Feature extraction (embeddings)
│ ├── all-MiniLM-L6-v2/
│ │ └── model.onnx
│ └── bge-small-en-v1.5/
│ ├── model.onnx
│ └── model.onnx_data # External data (if >2GB)
├── s2s/ # Seq2seq models
│ ├── t5-small/
│ │ ├── encoder_model.onnx
│ │ ├── decoder_model.onnx
│ │ └── decoder_with_past_model.onnx
│ └── bart-large-cnn/
└── llm/ # Large language models
├── deepseek-coder-1.3b-instruct/
│ ├── model.onnx
│ ├── model.onnx_data
│ └── model_merged.onnx # Merged version (if --merge used)
└── phi-3-mini-4k-instruct/
Architecture
Directory Structure
src/model_exporter/ (also available via `src/model_exporter/` alias)
├── cli/ # CLI entrypoints and subcommands
├── config/ # Logging and batch policy
├── export/ # Export pipeline, helpers, optimizer, subprocess runner
├── utils/ # Diagnostics and helper utilities
└── validation/ # Structural and numeric validation
Export Pipeline
- Preparation – Token setup, model validation, output directory creation
- Export –
optimum.exporters.onnx.main_exportwith fallback strategies - Validation – Structural checks + numeric validation (input/output comparison)
- Optimization – ONNX Runtime optimization passes (optional)
- Cleanup – Remove temporary files, prune redundant artifacts
Memory Management
Subprocess Isolation
For large models, use subprocess isolation to prevent parent process crashes:
flouds-export export `
--model-name meta-llama/Llama-2-7b-hf `
--use-sub-process `
--use-fallback-if-failed `
--use-external-data-format
Batch Export Memory Monitoring
The batch subcommand monitors available RAM before each export:
# Require at least 4GB free RAM before each export
flouds-export batch --preset recommended --min-free-memory-gb 4
Config-Driven Batch Workflow
The batch runner loads presets from YAML:
flouds-export batch --config src/model_exporter/config/policy.yaml --preset recommended
Each preset entry maps directly to export CLI arguments, which makes export pipelines deterministic and versionable.
Large Model Best Practices
For models >2GB:
- Enable external data format – Splits model into .onnx + .onnx_data
- Use subprocess isolation – Prevents memory leaks affecting subsequent exports
- Skip post-processing – Reduces peak memory during export
- Lower opset version – Simplifies optimization (try opset 11)
flouds-export export `
--model-name gpt2-large `
--use-external-data-format `
--use-sub-process `
--use-fallback-if-failed `
--no-post-process `
--opset-version 11
Troubleshooting
| Issue | Solution |
|---|---|
ModuleNotFoundError: optimum |
Install runtime dependencies: pip install -r requirements-prod.txt |
MemoryError or OOM crashes |
Use --use-sub-process and --use-external-data-format; reduce --optimization-level |
| Primary export fails on edge models | Retry with --use-fallback-if-failed to enable legacy fallback path |
RuntimeError: > 2GiB protobuf |
Enable --use-external-data-format |
ValueError: Unsupported opset |
Lower --opset-version to 14 or 11 |
TracerWarning: Converting tensor |
Model tracing limitation (usually safe to ignore) |
| Validation failures | Check numeric precision; try --skip-validator for known issues |
trust_remote_code required |
Add --trust-remote-code flag (review model code first) |
Export Logs
Logs are saved to logs/onnx_exports/ with per-model timestamped files. Configure log directory via FLOUDS_LOG_DIR environment variable.
Requirements
- Python: 3.11 or 3.12
- System: 8GB+ RAM (16GB+ for large models)
- Dependencies: See
requirements-prod.txt(runtime) andrequirements-dev.txt(development)
Contributing
See CONTRIBUTING.md for contribution workflow and local development checks.
For expected behavior and standards, see CODE_OF_CONDUCT.md and SECURITY.md.
Maintainer release steps are documented in docs/RELEASE_PROCESS.md.
License
Licensed under the Apache License, Version 2.0 (Apache-2.0). See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flouds_model_exporter-0.1.0.tar.gz.
File metadata
- Download URL: flouds_model_exporter-0.1.0.tar.gz
- Upload date:
- Size: 87.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9cf0f54d901c0f61ba5fc028b1b48053a70f0d9038eb2e27a5b9b443b2d5fda4
|
|
| MD5 |
879bc028cee9b8ab2085664826c950ac
|
|
| BLAKE2b-256 |
0c91e4dcc18ba06892576e8a34f27a5bc0c6b79314e02b3140ca02584e3e8fa7
|
Provenance
The following attestation bundles were made for flouds_model_exporter-0.1.0.tar.gz:
Publisher:
publish-pypi.yml on gmalakar/flouds_model_exporter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flouds_model_exporter-0.1.0.tar.gz -
Subject digest:
9cf0f54d901c0f61ba5fc028b1b48053a70f0d9038eb2e27a5b9b443b2d5fda4 - Sigstore transparency entry: 1340068455
- Sigstore integration time:
-
Permalink:
gmalakar/flouds_model_exporter@df949ee08604066d156bae0ef1dc9ba759fe489c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/gmalakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@df949ee08604066d156bae0ef1dc9ba759fe489c -
Trigger Event:
push
-
Statement type:
File details
Details for the file flouds_model_exporter-0.1.0-py3-none-any.whl.
File metadata
- Download URL: flouds_model_exporter-0.1.0-py3-none-any.whl
- Upload date:
- Size: 101.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7892a80e56c80163d354145f22c2ee60369fb5554bfef8677ee3d852978641e
|
|
| MD5 |
5dbc205046343c8532afec72b9c12a4f
|
|
| BLAKE2b-256 |
0494d01ea5c7341327d660c6f83acc38321730de48e32527456c272bebc87f17
|
Provenance
The following attestation bundles were made for flouds_model_exporter-0.1.0-py3-none-any.whl:
Publisher:
publish-pypi.yml on gmalakar/flouds_model_exporter
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flouds_model_exporter-0.1.0-py3-none-any.whl -
Subject digest:
c7892a80e56c80163d354145f22c2ee60369fb5554bfef8677ee3d852978641e - Sigstore transparency entry: 1340068463
- Sigstore integration time:
-
Permalink:
gmalakar/flouds_model_exporter@df949ee08604066d156bae0ef1dc9ba759fe489c -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/gmalakar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@df949ee08604066d156bae0ef1dc9ba759fe489c -
Trigger Event:
push
-
Statement type: