Accelerate Model Deployment on WinML

These details have not been verified by PyPI

Project links

Project description

WinML CLI

Status Python License

WinML CLI is a CLI toolkit to build portable, performant, and high-quality models for Windows ML. It covers the entire journey from pretrained model to on-device inference — export, optimization, quantization, compilation, and benchmarking — across all execution providers, regardless of silicon.

:dart: WinML CLI Is Right for You If

You want to build models that run on any Windows device — Qualcomm, Intel, AMD, NVIDIA, or CPU
You want to benchmark a model with one command — latency, throughput, and live hardware utilization
You want to catch compatibility issues ahead of time — unsupported ops, shape mismatches, EP gaps
You want deep insights into your model — I/O shapes, task mapping, operator coverage per EP
You want a repeatable and traceable model building process — config-driven, inspectable at every stage
You want AI agents to build and profile models for you — agent-ready skills for coding assistants

:desktop_computer: Supported Hardware

Execution Provider	Hardware	Status	EP Flag	Device Flag
QNN	Qualcomm NPU (Snapdragon X Elite)	🟢 Ready	`--ep qnn`	`--device npu`
OpenVINO	Intel NPU (Meteor Lake / Lunar Lake)	🟢 Ready	`--ep openvino`	`--device npu`
VitisAI	AMD NPU (Ryzen AI)	🟢 Ready	`--ep vitisai`	`--device npu`
NvTensorRTRTX	NVIDIA discrete GPUs	🔶 Planned	`--ep nv_tensorrt_rtx`	`--device gpu`
MIGraphX	AMD discrete GPUs	🔶 Planned	`--ep migraphx`	`--device gpu`
Dml	Hardware-agnostic GPU backend	🔶 Planned	`--ep dml`	`--device gpu`
CPU	Cross-platform fallback	⚪ Always available	`--ep cpu`	`--device cpu`

Tip: Use --device auto and WinML CLI picks the best available device — NPU first, then GPU, then CPU.

:clipboard: Prerequisites

Required Software

Component	How to Get It
Windows 11 (x64 or ARM64)	Windows 11 24H2+ required for NPU support
UV	Install UV
WinML CLI (Python wheel)	Releases

Required Hardware

WinML CLI targets NPU. We recommend testing on one of the following NPU devices:

Device	EP	Flag
Snapdragon X Elite (Qualcomm)	QNN	`--ep qnn --device npu`
Intel AI Boost (Meteor Lake / Lunar Lake)	OpenVINO	`--ep openvino --device npu`
AMD Ryzen AI (Phoenix / Hawk Point / Strix)	VitisAI	`--ep vitisai --device npu`

No NPU? Use --device auto — WinML CLI will fall back to the best available device (GPU → CPU). Note that winml compile requires NPU and cannot run without one.

Accepted Inputs

HuggingFace model ID (e.g., microsoft/resnet-50) — weights are downloaded on first run
Local ONNX file (e.g., model.onnx) — from winml export, winml build, or any ONNX you already have

The Golden Rule: Inspect First

Before running any pipeline command, always verify the model is supported:

winml inspect -m <model-id>

If inspect prints an error or shows Unsupported, skip that model. Only models that pass inspect are valid inputs for export, analyze, build, perf, and eval.

:package: Installation

WinML CLI requires Python 3.11 and is distributed as a Python wheel. We recommend uv for fast, reproducible environment setup.

1. Create a Python 3.11 environment

uv venv --python 3.11

Activate it:

# Windows (PowerShell)
.venv\Scripts\activate

# Windows (Git Bash / WSL)
source .venv/Scripts/activate

2. Install from wheel

uv pip install winml_cli-<version>-py3-none-any.whl

3. Verify your environment

winml sys --list-device --list-ep

Confirm that your target device and EP appear in the output:

Snapdragon X Elite — look for QNNExecutionProvider
Intel AI Boost — look for OpenVINOExecutionProvider
AMD Ryzen AI — look for VitisAIExecutionProvider

If no NPU is detected, you can still use WinML CLI with --device auto for most commands. The only exception is winml compile, which requires an NPU device.

:wrench: Commands

Category	Commands	Purpose
Primitives	`inspect` `export` `optimize` `quantize` `compile`	Single-stage building blocks
Pipeline	`config` `build` `perf` `eval` `run`*	End-to-end orchestration
Insights	`analyze` `debug`*	Diagnostics and compatibility
Utilities	`hub` `cache`* `doctor`* `setting`* `sys`	Catalog, cache, and environment

* = coming soon

Primitives — one stage at a time

winml inspect — Discover model metadata. Prints the task, model class, input/output tensor names and shapes, and execution provider compatibility. No weights are loaded — this reads only the model configuration, making it fast and lightweight. Always run inspect first to verify a model is supported.

winml export — Convert a source model to ONNX. Takes a Hugging Face model ID (or local checkpoint) and produces a standards-compliant ONNX file with hierarchy-preserving metadata.

winml optimize — Fuse operators, simplify graphs, and prepare for target EPs. Takes an ONNX model and an optimization config (typically generated by winml analyze) and applies graph-level transformations: operator fusion, constant folding, shape inference, and EP-specific rewrites.

winml quantize — Compress to low-bit precision. Reduces model size and inference latency by converting weights and activations from FP32 to INT8 (or other low-bit formats). After quantization, the model is portable — it can run on any ONNX Runtime backend.

winml compile — Generate device-specific binaries. Takes a quantized ONNX model and produces EP-specific compiled artifacts (for example, QNN context binaries for Qualcomm NPU). This step locks the model to a specific device but delivers the lowest possible inference latency.

Pipeline — orchestrated workflows

winml config — Auto-detect optimal settings into a JSON config. Inspects the model and generates a complete build specification: task, I/O shapes, optimization flags, quantization parameters, and target EP settings. The config file is reviewable, editable, and version-controllable — the single source of truth for your build.

winml build — Orchestrate the full pipeline. Takes a config file and executes every stage in sequence: export, analyze, optimize, quantize, and compile. Two commands (config + build) replace eight manual steps.

winml perf — Benchmark latency, throughput, and hardware utilization. Runs inference on the target device and reports latency percentiles (p50, p90, p99), throughput (inferences per second), and optionally live hardware monitoring (CPU, RAM, NPU utilization) with the --monitor flag. Can accept a local ONNX file or a Hugging Face model ID.

winml eval — Measure model accuracy against reference datasets. Compares the output of your optimized/quantized model against the original to quantify any accuracy loss introduced by the pipeline.

winml run — End-to-end inference with pre/post processing. (Coming soon.)

Insights — understand what is happening inside

winml analyze — Lint operators, check EP compatibility, and generate optimization config. The analyzer has two components: the Linter (like ESLint for ONNX) checks every operator against target EPs and classifies each as supported, partial, or unsupported. AutoConf detects suboptimal patterns and generates the optimization config that the optimizer consumes. Together they form the analyze-optimize loop.

winml debug — Interactive model debugging and layer-by-layer inspection. (Coming soon.)

Utilities — catalog, cache, and environment

winml catalog — Browse the curated built-in model catalog.

winml cache — Manage built model artifacts and pipeline outputs. View, clean, or selectively remove cached models and intermediate files.

winml doctor — Diagnose environment issues. Checks runtimes, execution providers, and dependencies to identify configuration problems.

winml setting — Configure WinML CLI preferences. Set default EPs, output directories, and other global options.

winml sys — System information and capability reporting. Prints detected hardware, available EPs, Python version, and installed package versions.

:rocket: Quick Start

Inspect a Model

The fastest way to get started is to inspect a model. Let's look at ResNet-50:

winml inspect -m microsoft/resnet-50

This prints the model's metadata without downloading weights:

Task: image-classification — what the model does
Model class: ResNetForImageClassification — the architecture
Input tensors: names, data types, and shapes (e.g., pixel_values: float32 [1, 3, 224, 224])
Output tensors: names, data types, and shapes (e.g., logits: float32 [1, 1000])

If inspect succeeds, the model is supported and you can proceed with the rest of the pipeline.

Golden rule: always inspect first. Before running export, build, perf, or any other pipeline command, verify the model is supported with winml inspect.

Build with Primitive Commands

This walkthrough builds ConvNeXT (facebook/convnext-base-224) step by step using primitive commands. ConvNeXT is a family of CNN models inspired by Vision Transformers, introduced by Meta in 2022 — it offers high accuracy while retaining the efficiency of CNNs.

Phase 1: Inspect

winml inspect -m facebook/convnext-base-224

Phase 2: Build a Portable Model

Export from PyTorch to ONNX:

winml export -m facebook/convnext-base-224 -o convnext/model.onnx -v

Analyze for EP compatibility:

winml analyze -m convnext/model.onnx --optim-config optim.json

Optimize the graph using the analyzer's config:

winml optimize -m convnext/model.onnx -c optim.json -o convnext/model_opt.onnx

Quantize to INT8:

winml quantize -m convnext/model_opt.onnx -o convnext/model_opt_int8.onnx

Phase 3: Benchmark on Device

Compile for NPU (generates device-specific binaries):

winml compile -m convnext/model_opt_int8.onnx --ep qnn -o convnext/model_compiled.onnx

Benchmark on NPU — note the latency:

winml perf -m convnext/model_compiled.onnx --ep qnn --iterations 100

Benchmark on CPU for comparison:

winml perf -m convnext/model_opt.onnx --ep cpu --iterations 100

Compare the two numbers to see the performance difference between NPU and CPU inference.

Build with Config + Build

Same model, different approach. Instead of running each command manually, use the config-driven pipeline. Think of it like CMake: config generates a build plan, build executes it.

Generate the build config:

winml config -m facebook/convnext-base-224 -o convnext_config.json

This creates a JSON file containing all settings for every pipeline step — task, I/O shapes, optimization flags, quantization parameters — all auto-detected from the model.

Build the model:

winml build -c convnext_config.json -m facebook/convnext-base-224 -o convnext_build/

This orchestrates the full pipeline — export, analyze, optimize, quantize, compile — all in one go. Same result as the manual steps above, but in two commands.

Benchmark the result:

winml perf -m convnext_build/model.onnx --ep qnn --iterations 100

The config file is the single source of truth for your build. Version-control it, share it with teammates, edit it to override settings, and replay builds deterministically on any machine.

Benchmark in One Command

The simplest way to evaluate a model — one command, zero setup:

winml perf -m facebook/convnext-base-224 --device npu --monitor

WinML CLI handles everything behind the scenes: download the model from Hugging Face, export to ONNX, optimize the graph, and run the benchmark on your NPU. The --monitor flag enables live hardware monitoring — real-time CPU utilization, RAM usage, and NPU activity alongside the latency results.

This is ideal for quick smoke tests: does the model run on this device, and how fast is it?

:arrows_counterclockwise: The BYOM Workflow

The Build Your Own Model (BYOM) workflow is the philosophy behind WinML CLI. It defines how a source model becomes a production-ready, device-optimized artifact.

The Pipeline

Source Model --> Export --> Analyze --> Optimize --> Quantize --> Compile --> Benchmark

BYOM Workflow

Each arrow is a WinML CLI command. You can enter the pipeline at any stage (for example, start with a local ONNX file and skip export), exit early (stop after optimization if you do not need quantization), or loop back to repeat a stage with different settings.

Primitive Commands vs. Config-Driven Pipeline

	Primitive Commands	Config-Driven Pipeline
Steps	One command per stage	Two steps: config + build
Control	Start from any stage; try different settings to fix errors or tweak performance	Repeatable, tweakable, version-controllable
Best for	Flexible workflow	Production-ready delivery
When to use	Exploring, debugging, prototyping	CI/CD, batch builds, team workflows
Lifecycle	"Coding" phase	Polish

:clipboard: Built-in Models

Run winml catalog to browse the full catalog interactively.

Click to expand the full model catalog

Model ID	Task	Architecture
`microsoft/resnet-50`	image-classification	ResNet
`google/vit-base-patch16-224`	image-classification	ViT
`microsoft/swin-large-patch4-window7-224`	image-classification	Swin
`facebook/convnext-tiny-224`	image-classification	ConvNeXT
`rizvandwiki/gender-classification`	image-classification	ViT
`ProsusAI/finbert`	text-classification	BERT
`Intel/bert-base-uncased-mrpc`	text-classification	BERT
`cardiffnlp/twitter-roberta-base-sentiment-latest`	text-classification	RoBERTa
`dslim/bert-base-NER`	token-classification	BERT
`dbmdz/bert-large-cased-finetuned-conll03-english`	token-classification	BERT
`Babelscape/wikineural-multilingual-ner`	token-classification	BERT
`w11wo/indonesian-roberta-base-posp-tagger`	token-classification	RoBERTa
`microsoft/table-transformer-detection`	object-detection	Table Transformer
`mattmdjaga/segformer_b2_clothes`	image-segmentation	SegFormer
`nvidia/segformer-b1-finetuned-ade-512-512`	image-segmentation	SegFormer
`nvidia/segformer-b2-finetuned-ade-512-512`	image-segmentation	SegFormer
`nvidia/segformer-b5-finetuned-ade-640-640`	image-segmentation	SegFormer

These models are verified against WinML CLI's full pipeline and serve as reliable starting points. You are not limited to this list — any Hugging Face model that passes winml inspect is a valid input.

For models not in this table, run winml inspect -m <model-id> to verify support before proceeding.

:warning: Scope & Limitations

What WinML CLI supports

WinML CLI targets classic deep learning models — CNNs, encoders, vision transformers, NLP classifiers, token classifiers, object detection models, and segmentation models.

Supported tasks include:

Image classification (ResNet, ViT, Swin, ConvNeXT)
Text classification (BERT, RoBERTa)
Token classification / NER (BERT, RoBERTa)
Object detection (Table Transformer)
Image segmentation (SegFormer)

What WinML CLI does not support

LLMs and generative models are not in scope. Do not use WinML CLI with GPT, LLaMA, Phi, Mistral, Stable Diffusion, or any model with a decoder-only or sequence-to-sequence generative architecture. LLM support (with LoRA) is planned for Q3-Q4 2026.

Known constraints

winml compile requires an NPU device. If no NPU is available, skip the compile step and use --device auto for benchmarking.
Some models may export successfully but fail during optimization or quantization due to unsupported operator patterns. The analyzer will flag these issues.
Performance numbers vary by device, driver version, and EP version. Always benchmark on your target hardware.

:world_map: Roadmap

Milestone	Target	Highlights
🟡 Kickoff	Q4 2025	Internal prototype, core primitive commands
🟢 Early Access	Q1 2026	First external testers, config + build pipeline, hub catalog
🔵 Public Beta	Q2 2026	Open source, agent skills, Foundry Toolkit integration
🟣 RC	Q3-Q4 2026	LLM support (with LoRA), broader device coverage, MLIR

Click to expand roadmap details

Q4 2025 — Kickoff

Primitive commands: inspect, export, optimize, quantize, compile
QNN, OpenVINO, and VitisAI execution provider support
Internal validation with ResNet, BERT, ViT, SegFormer families

Q1 2026 — Early Access

Pipeline commands: config, build, perf, eval
Analyzer with auto-configuration loop
Built-in model catalog (winml catalog)
Live hardware monitoring (--monitor)

Q2 2026 — Public Beta

Open source release
Agent-ready skills for coding assistants (Claude Code, Cursor, Copilot)
Foundry Toolkit for VS Code integration

Q3-Q4 2026 — Release Candidate

LLM support (decoder-only architectures with LoRA adapters)
NvTensorRTRTX, MIGraphX, and Dml execution providers
MLIR-based optimization backend
Public SDK and framework APIs

:lock: Data / Telemetry

Official WinML CLI releases can collect anonymous usage telemetry to help improve the product. Telemetry is classified as Optional. A one-time prompt on your first run asks for consent (default: accept — press Enter to enable, type n to decline).

Dev installs (pip install -e . or running from a source checkout) never send telemetry.

Control — edit %USERPROFILE%\.winml\config.json:

Set telemetry.consent to "disabled" to opt out
Set telemetry.consent to "enabled" to opt in
Delete the file to re-show the first-run prompt on the next run

Telemetry is automatically disabled in CI / non-TTY environments regardless of the stored decision.

See docs/Privacy.md for the full list of what is and is not collected, event schemas, CI auto-disable behavior, and storage locations.

:handshake: Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

:balance_scale: Code of Conduct

See CODE_OF_CONDUCT.md.

:page_facing_up: License

This project is licensed under the MIT License.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

winml_cli-0.1.0.tar.gz (10.0 MB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

winml_cli-0.1.0-py3-none-any.whl (13.6 MB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file winml_cli-0.1.0.tar.gz.

File metadata

Download URL: winml_cli-0.1.0.tar.gz
Upload date: May 27, 2026
Size: 10.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for winml_cli-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`270c0076d16d3ca7cc6027621f98401945e0e6f8b3f96aaf39ce0c5fca9c07b2`
MD5	`03558626aad17ab8f0c16a7464f649cd`
BLAKE2b-256	`5db87496315175edd500855a997c92ccea5423928eb2804a5c26fc1c7cfc8ae6`

See more details on using hashes here.

File details

Details for the file winml_cli-0.1.0-py3-none-any.whl.

File metadata

Download URL: winml_cli-0.1.0-py3-none-any.whl
Upload date: May 27, 2026
Size: 13.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: RestSharp/106.13.0.0

File hashes

Hashes for winml_cli-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0ba80ce3c6420929d08f70012062012ed9057f9b4d30ccce8f1569c33de8e273`
MD5	`1fec00c4762fb25d1d44abf0f34f41ea`
BLAKE2b-256	`815a545d92de94a1c53ee9c72494dbc85efda7f0090ad055d7ce942030e5c90f`

See more details on using hashes here.

winml-cli 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

WinML CLI

:dart: WinML CLI Is Right for You If

:desktop_computer: Supported Hardware

:clipboard: Prerequisites

Required Software

Required Hardware

Accepted Inputs

The Golden Rule: Inspect First

:package: Installation

:wrench: Commands

:rocket: Quick Start

Inspect a Model

Build with Primitive Commands

Phase 1: Inspect

Phase 2: Build a Portable Model

Phase 3: Benchmark on Device

Build with Config + Build

Benchmark in One Command

:arrows_counterclockwise: The BYOM Workflow

The Pipeline

Primitive Commands vs. Config-Driven Pipeline

:clipboard: Built-in Models

:warning: Scope & Limitations

What WinML CLI supports

What WinML CLI does not support

Known constraints

:world_map: Roadmap

:lock: Data / Telemetry

:handshake: Contributions and Feedback

:balance_scale: Code of Conduct

:page_facing_up: License

Trademarks

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes