LiteRT Toolkit Command Line Interface

Project description

LiteRT CLI (Preview)

A convenient command-line toolkit to streamline LiteRT related development workflows, including converting, quantizing, compiling, running, benchmarking and visualizing LiteRT (TFLite) models on various hardware (CPU / GPU / NPU) across platforms (desktop, mobile, or cloud).

🚀 Installation | ⚡ Quick start | 💡 Common commands ｜ 📓 Try Colab | 🌟 Quick demos | 🤖 Use in coding agent

LiteRT CLI is built on top of Google AI Edge stacks, including LiteRT, LiteRT-LM, LiteRT Torch, AI Edge Quantizer, AI Edge Portal, and Model Explorer.

[!NOTE] It's still an early preview under active development, thus has limited platform and feature support, plus possible bugs. We appreciate your patience and feedback to help us improve it. Welcome issues and PRs!

🚀 Installation

Please install litert-cli-nightly from PyPI or from local clone. LiteRT CLI will install the dependencies on-demand, based on which commands to run, to speed up initial installation.

We support installation using either uv (recommended for ultra-fast dependency resolution) or standard pip within a Python virtual environment.

Option 1: Use uv (recommended)

uv is an extremely fast Python package manager written in Rust.

# 1. Create a virtual environment with Python 3.13.
# TIP: Sometimes setting env var `UV_INDEX_URL=https://pypi.org/simple` helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate

# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly

# 3. Run help command
litert --help

Option 2: Use standard pip

python3 -m venv .venv
source .venv/bin/activate
pip install -q litert-cli-nightly
litert --help

Option 3: Install from local clone (for development)

uv venv --clear --python=3.13 --seed
source .venv/bin/activate
git clone git@github.com:google-ai-edge/LiteRT-CLI.git
cd LiteRT-CLI
uv pip install -e .

⚡ Quick start

📓 Try Colab

Try LiteRT CLI Colab to explore different features quickly.

Follow command help

You can always follow litert --help or litert {command} --help to find how to use the CLI tool. Check detailed instructions for each command below.

# Run help command
litert --help

# Download a LiteRT model
litert download --help
litert download litert-community/efficientnet_b1 --file "*.tflite" --output efficientnet

# Run and benchmark a LiteRT model on your devices
litert run --help
litert run efficientnet/efficientnet_b1.tflite --desktop --cpu
litert benchmark --help
litert benchmark efficientnet/efficientnet_b1.tflite --android --gpu

🌟 Quick demos

Check comprehensive usage examples under the examples/ directory, which contains per-command demos and model-specific demos.

If you have cloned the repo, you can run the following commands to see the demos. Note: running all demos will take time and disk space.

# Run all command demos
./examples/run_commands.sh
# Run specific command demos
./examples/run_commands.sh download,benchmark

# Run all model demos
./examples/run_models.sh

# Run a specific model demo
./examples/run_models.sh efficientnet

🤖 Use in coding agent

Add the LiteRT CLI skill SKILL.md into your coding agent (like Google Antigravity) and try prompts such as:

Download LiteRT model litert-community/efficientnet_b1 and run it on CPU
Benchmark LiteRT model litert-community/efficientnet_b1 on my Android GPU
Compile LiteRT model litert-community/efficientnet_b1 for NPU target sm8750
Visualize LiteRT model litert-community/efficientnet_b1
Download the FP32 model litert-community/efficientnet_b1 , quantize it to INT8 dynamic range (--recipe dynamic_wi8_afp32), then benchmark both the original FP32 model and the newly quantized INT8 model on the GPU of my connected Android device. Compare the average latency and report the throughput speedup.
Convert the model Qwen/Qwen1.5-0.5B-Chat from HuggingFace, and run it locally using the prompt 'Explain edge machine learning one sentence'
Download EfficientNet from huggingface repo litert-community/efficientnet_b1, offline compile (AOT) the model for the sm8750 target NPU, and output the compiled model into ./models/compiled. Then, run an on-device inference and benchmark using this newly compiled AOT model on the connected Android device's NPU (--npu). Confirm that the graph loads directly without dynamic JIT compilation warmup latency.

The agent will automatically install the necessary tools, including Python virtual environments, litert-cli-nightly, and all required dependencies.

Verified platforms

Verified in Python 3.13.

Host Machines:
- Linux (Ubuntu)
- macOS (Apple Silicon): don't support litert compile yet.
- Windows: litert compile and litert convert not supported yet.
Android:
- CPU, GPU
- NPU: Qualcomm, MediaTek (soon), Google Tensor (soon)

Troubleshooting and tips

Always activate python virtual environment before running litert command, to avoid conflicts.
When uv fails to resolve dependencies, try to set below environment variable first: export UV_INDEX_URL=https://pypi.org/simple.
When run fails on GPU using --gpu flag, try to add both --cpu --gpu flags in the command, then the CLI will try CPU first, and fall back to GPU when CPU fails.
When litert run fails on Android device, if the device is not detected, try to run adb kill-server first.
When convert a traditional PyTorch model, you need to write a script to wrap it with required functions get_model and get_args. Check the script format in examples/convert_model.py.
LLM conversion only supports HuggingFace models with type AutoModelForCausalLM and
Gemma family now.
For large models like LLMs, litert convert will take large memories and disks, and spend multiple minutes. Please make sure you have enough memory and disks, and be patient.
litert compile only supports running on Linux now, and it requires newer Clang has version 18.x.x or above. Try sudo apt install clang libc++-dev libc++abi-dev.
When benchmark using --gcp flag, you need to 1) Join the EAP program of Google AI Edge Portal; 2) Login to GCP using gcloud auth login; 3) Set your GCP project using --gcp=<Your-GCP-Project>.
When litert visualize fails to launch Model Explorer, try to run litert visualize --stop-all first.
Exporting environment variable LITERT_VERBOSE=1 can enable verbose logging.
litert clean will clean all local caches, like model files and binaries, which will free your disk space, and further, it will be very helpful for fixing complicated issues, like issues caused by NPU library version mismatch.

💡 Common commands

1. Download a model from Hugging Face Hub

# Download only .tflite files
litert download litert-community/MobileNet-v3-large \
  --file "*.tflite" \
  --output mobilenet

# Download full repository
litert download litert-community/MobileNet-v3-large \
  --output mobilenet_full

# Download models using Hugging Face ID (uses HF ID as model reference too)
litert download litert-community/MobileNet-v3-large

# Download models with custom model reference
litert download litert-community/MobileNet-v3-large --model-ref my_model_ref

2. Convert a PyTorch model into a LiteRT model

# Automated HF Conversion
litert convert Qwen/Qwen1.5-0.5B-Chat --output /tmp/qwen

# Automated HF Conversion with INT4 Weight-Only Quantization
litert convert Qwen/Qwen1.5-0.5B-Chat --quantize-recipe weight_only_wi4_afp32 --output /tmp/qwen_w4

# Generic Script Injection with INT8 Dynamic Quantization
litert convert my_model.py --quantize-recipe dynamic_wi8_afp32 --output /tmp/mymodel

3. Quantize a LiteRT model

# Dynamic INT8 Quantization (Default)
litert quantize model.tflite \
  --recipe dynamic_wi8_afp32 \
  --output dynamic.tflite

# Weight-Only Quantization
litert quantize model.tflite \
  --recipe weight_only_wi8_afp32 \
  --output weight_only.tflite

# Static W8A8 Quantization (with calibration data)
litert quantize model.tflite \
  --recipe static_wi8_ai8 \
  --calibration-data calib_data.py \
  --output static.tflite

# Custom Recipe
litert quantize model.tflite \
  --custom-recipe quantize_recipe.json \
  --output custom_quant.tflite

4. AOT Compile a LiteRT model for NPU

[!NOTE] Currently only support on Linux hosts and Qualcomm NPUs, and other NPU supports are coming soon!

# Basic compilation for specific Qualcomm NPU (e.g., sm8750)
litert compile model.tflite --target sm8750

# Compile for multiple targets and export an AI Pack for Android
litert compile model.tflite --target sm8750 --target mt6989 --export-aipack my_npu_models

5. Run a LiteRT model on desktop or Android

# Run locally on desktop (CPU)
litert run model.tflite --desktop --cpu
litert run my_model_ref --desktop --cpu

# Run with GPU acceleration and CPU fallback (multi-accelerator)
litert run model.tflite --gpu --cpu
litert run model.tflite --accelerator gpu,cpu

# Run on connected Android device
litert run model.tflite --android

# Run on connected Android device with NPU acceleration and CPU fallback
litert run model.tflite --android --npu --cpu
litert run model.tflite --android --accelerator npu,cpu

# Run on connected Android device with NPU AOT-compiled model
litert run model_sm8450.tflite --android --npu

# Run multiple iterations and print output tensors
litert run model.tflite \
  --iterations 5 \
  --print-tensors

# Run with custom input formats (supports image, raw binary, numpy array)
litert run model.tflite \
  --input "image.png" \
  --print-tensors

6. Benchmark a LiteRT model

# Benchmark on Android (CPU side)
litert benchmark my_model_ref --android --cpu
litert benchmark model.tflite --android --cpu

# Benchmark on Android NPU (JIT mode)
litert benchmark model.tflite --android --npu

# Benchmark AOT compiled model on Android NPU
litert benchmark model_sm8450.tflite --android --npu

# Benchmark on Android GPU
litert benchmark model.tflite --android --gpu

# Benchmark on macOS (CPU)
litert benchmark my_model_ref --desktop --cpu

# Benchmark on Google AI Edge Portal in Google Cloud. Prerequisites:
# - Set up your Google AI Edge Portal account by following up the instructions at:
#   https://ai.google.dev/edge/ai-edge-portal
# - Set up authentication by running: gcloud auth login
# - You can set the default GCP project by setting the environment variable LITERT_GCP_PROJECT, or by providing the --gcp-project option.
# - You can specific your GCP bucket by --gcp-bucket, otherwise, it will create default
#   one.
litert benchmark model.tflite --gcp --device "pixel 7" --gcp-project "your-gcp-project-id" --gcp-bucket "your-gcp-bucket"
litert benchmark model.tflite --gcp --devices "pixel 7, sm-s931u1" --gpu

7. Visualize a model's architecture

# Open in Model Explorer graph
litert visualize model.tflite

# Clean up and stop visualizer background servers
litert visualize --stop-all

8. Import a local model

# Import a local file into the centralized cache
litert import my_model.tflite --model-ref my_model

# Import a directory and associate with a Hugging Face ID
litert import ./my_model_dir --model-ref my_model --hf-id my_org_name/my_model

9. List managed models

# List all managed models
litert list

# Show detailed contents of a specific model using model reference.
litert list my_model

10. Delete a managed model

# Delete a model from cache
litert delete my_model

11. Run and benchmark a generative LLM model using LiteRT-LM CLI

litert lm command will utlitize litert-lm, and you can use the same command with litert-lm, for example, both litert lm run and litert-lm run or litert lm benchmark and litert-lm benchmark achieve the same results.

Please follow the LiteRT-LM CLI guide for detailed instructions.

# Run a generative LLM model, and load from hugging face
litert lm run  \
  --from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
  gemma-4-E2B-it.litertlm \
  --prompt="What is the capital of France?"

# Or load from a local LLM model file
litert lm run ./my_model.litertlm

# Example with a custom prompt
litert lm run ./my_model.litertlm --prompt "Hello, how are you?"

# Benchmark a generative LLM model
litert lm benchmark ./my_model.litertlm

12. Clean up all caches

# Clean up local cache, like model files and binaries.
litert clean

Project details

Release history Release notifications | RSS feed

0.2.0.dev20260707 pre-release

Jul 7, 2026

0.2.0.dev20260706 pre-release

Jul 6, 2026

0.2.0.dev20260705 pre-release

Jul 5, 2026

0.2.0.dev20260704 pre-release

Jul 4, 2026

0.2.0.dev20260703 pre-release

Jul 3, 2026

0.2.0.dev20260702 pre-release

Jul 2, 2026

0.2.0.dev20260701 pre-release

Jul 1, 2026

0.2.0.dev20260630 pre-release

Jun 30, 2026

0.2.0.dev20260629 pre-release

Jun 29, 2026

0.2.0.dev20260628 pre-release

Jun 28, 2026

0.2.0.dev20260627 pre-release

Jun 27, 2026

0.2.0.dev20260626 pre-release

Jun 26, 2026

0.2.0.dev20260625 pre-release

Jun 25, 2026

0.2.0.dev20260624 pre-release

Jun 24, 2026

0.2.0.dev20260623 pre-release

Jun 23, 2026

0.2.0.dev20260622 pre-release

Jun 22, 2026

0.2.0.dev20260621 pre-release

Jun 21, 2026

0.2.0.dev20260620 pre-release

Jun 20, 2026

0.2.0.dev20260619 pre-release

Jun 19, 2026

0.2.0.dev20260618 pre-release

Jun 18, 2026

0.2.0.dev20260617 pre-release

Jun 17, 2026

0.2.0.dev20260616 pre-release

Jun 16, 2026

0.2.0.dev20260615 pre-release

Jun 15, 2026

0.2.0.dev20260614 pre-release

Jun 14, 2026

0.2.0.dev20260613 pre-release

Jun 13, 2026

0.2.0.dev20260612 pre-release

Jun 12, 2026

0.2.0.dev20260611 pre-release

Jun 11, 2026

0.2.0.dev20260610 pre-release

Jun 10, 2026

0.2.0.dev20260609 pre-release

Jun 9, 2026

0.2.0.dev20260608 pre-release

Jun 8, 2026

0.2.0.dev20260607 pre-release

Jun 7, 2026

0.2.0.dev20260606 pre-release

Jun 6, 2026

0.2.0.dev20260605 pre-release

Jun 5, 2026

0.2.0.dev20260604 pre-release

Jun 4, 2026

0.2.0.dev20260603 pre-release

Jun 3, 2026

0.2.0.dev20260602 pre-release

Jun 2, 2026

0.2.0.dev20260601 pre-release

Jun 1, 2026

0.2.0.dev20260531 pre-release

May 31, 2026

0.2.0.dev20260530 pre-release

May 30, 2026

0.2.0.dev20260529 pre-release

May 29, 2026

0.2.0.dev20260528 pre-release

May 28, 2026

0.2.0.dev20260527 pre-release

May 27, 2026

0.2.0.dev20260526 pre-release

May 26, 2026

0.2.0.dev20260525 pre-release

May 25, 2026

0.2.0.dev20260524 pre-release

May 24, 2026

0.2.0.dev20260523 pre-release

May 23, 2026

0.2.0.dev20260522 pre-release

May 22, 2026

0.2.0.dev20260521 pre-release

May 21, 2026

This version

0.2.0.dev20260520 pre-release

May 20, 2026

0.1.0.dev20260519 pre-release

May 19, 2026

0.1.0.dev20260518 pre-release

May 18, 2026

0.1.0.dev20260517 pre-release

May 17, 2026

0.1.0.dev20260516 pre-release

May 16, 2026

0.1.0.dev20260515 pre-release

May 15, 2026

0.1.0.dev20260514 pre-release

May 14, 2026

0.1.0.dev20260513 pre-release

May 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl (126.1 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl.

File metadata

Download URL: litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl
Upload date: May 20, 2026
Size: 126.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7c1a1e6af29043dca2964513073213145e76807b6eeba2a6559fb545e201d5dd`
MD5	`ef21bb63238669a7c93692941250695e`
BLAKE2b-256	`0384c709365244d27bc06798c7f973340a049868bd0e447b554415aaa4c55b1a`

See more details on using hashes here.

litert-cli-nightly 0.2.0.dev20260520

Navigation

Verified details

Maintainers

Unverified details