LiteRT Toolkit Command Line Interface
Project description
LiteRT CLI (Preview)
A convenient command-line toolkit to streamline LiteRT related development workflows, including converting, quantizing, compiling, running, benchmarking and visualizing LiteRT (TFLite) models on various hardware (CPU / GPU / NPU) across platforms (desktop, mobile, or cloud).
🚀 Installation | ⚡ Quick start | 💡 Common commands | 📓 Try Colab | 🌟 Quick demos | 🤖 Use in coding agent
LiteRT CLI is built on top of Google AI Edge stacks, including LiteRT, LiteRT-LM, LiteRT Torch, AI Edge Quantizer, AI Edge Portal, and Model Explorer.
[!NOTE] It's still an early preview under active development, thus has limited platform and feature support, plus possible bugs. We appreciate your patience and feedback to help us improve it. Welcome issues and PRs!
🚀 Installation
Please install litert-cli-nightly from PyPI or from local clone. LiteRT CLI
will install the dependencies on-demand, based on which commands to run, to
speed up initial installation.
We support installation using either uv (recommended for ultra-fast dependency resolution) or standard pip within a Python virtual environment.
Option 1: Use uv (recommended)
uv is an extremely fast Python package manager written in Rust.
# 1. Create a virtual environment with Python 3.13.
# TIP: Sometimes setting env var `UV_INDEX_URL=https://pypi.org/simple` helps
# resolve dependency resolution errors.
uv venv --clear --python=3.13 --seed
source .venv/bin/activate
# 2. Install the package into the active virtual environment
uv pip install litert-cli-nightly
# 3. Run help command
litert --help
Option 2: Use standard pip
python3 -m venv .venv
source .venv/bin/activate
pip install -q litert-cli-nightly
litert --help
Option 3: Install from local clone (for development)
uv venv --clear --python=3.13 --seed
source .venv/bin/activate
git clone git@github.com:google-ai-edge/LiteRT-CLI.git
cd LiteRT-CLI
uv pip install -e .
⚡ Quick start
📓 Try Colab
Try LiteRT CLI Colab to explore different features quickly.
Follow command help
You can always follow litert --help or litert {command} --help to find how
to use the CLI tool. Check
detailed instructions for each command below.
# Run help command
litert --help
# Download a LiteRT model
litert download --help
litert download litert-community/efficientnet_b1 --file "*.tflite" --output efficientnet
# Run and benchmark a LiteRT model on your devices
litert run --help
litert run efficientnet/efficientnet_b1.tflite --desktop --cpu
litert benchmark --help
litert benchmark efficientnet/efficientnet_b1.tflite --android --gpu
🌟 Quick demos
Check comprehensive usage examples under the examples/ directory, which contains per-command demos and model-specific demos.
If you have cloned the repo, you can run the following commands to see the demos. Note: running all demos will take time and disk space.
# Run all command demos
./examples/run_commands.sh
# Run specific command demos
./examples/run_commands.sh download,benchmark
# Run all model demos
./examples/run_models.sh
# Run a specific model demo
./examples/run_models.sh efficientnet
🤖 Use in coding agent
Add the LiteRT CLI skill
SKILL.md
into your coding agent (like Google Antigravity)
and try prompts such as:
- Download LiteRT model
litert-community/efficientnet_b1and run it on CPU - Benchmark LiteRT model
litert-community/efficientnet_b1on my Android GPU - Compile LiteRT model
litert-community/efficientnet_b1for NPU targetsm8750 - Visualize LiteRT model
litert-community/efficientnet_b1 - Download the FP32 model
litert-community/efficientnet_b1, quantize it to INT8 dynamic range (--recipe dynamic_wi8_afp32), then benchmark both the original FP32 model and the newly quantized INT8 model on the GPU of my connected Android device. Compare the average latency and report the throughput speedup. - Convert the model
Qwen/Qwen1.5-0.5B-Chatfrom HuggingFace, and run it locally using the prompt 'Explain edge machine learning one sentence' - Download EfficientNet from huggingface repo
litert-community/efficientnet_b1, offline compile (AOT) the model for thesm8750target NPU, and output the compiled model into./models/compiled. Then, run an on-device inference and benchmark using this newly compiled AOT model on the connected Android device's NPU (--npu). Confirm that the graph loads directly without dynamic JIT compilation warmup latency.
The agent will automatically install the necessary tools, including Python
virtual environments, litert-cli-nightly, and all required dependencies.
Verified platforms
Verified in Python 3.13.
- Host Machines:
- Linux (Ubuntu)
- macOS (Apple Silicon): don't support
litert compileyet. - Windows:
litert compileandlitert convertnot supported yet.
- Android:
- CPU, GPU
- NPU: Qualcomm, MediaTek (soon), Google Tensor (soon)
Troubleshooting and tips
- Always activate python virtual environment before running
litertcommand, to avoid conflicts. - When
uvfails to resolve dependencies, try to set below environment variable first:export UV_INDEX_URL=https://pypi.org/simple. - When run fails on GPU using
--gpuflag, try to add both--cpu --gpuflags in the command, then the CLI will try CPU first, and fall back to GPU when CPU fails. - When
litert runfails on Android device, if the device is not detected, try to runadb kill-serverfirst. - When convert a traditional PyTorch model, you need to write a script to wrap
it with required functions
get_modelandget_args. Check the script format inexamples/convert_model.py. - LLM conversion only supports HuggingFace models with type AutoModelForCausalLM and
- Gemma family now.
- For large models like LLMs,
litert convertwill take large memories and disks, and spend multiple minutes. Please make sure you have enough memory and disks, and be patient. litert compileonly supports running on Linux now, and it requires newer Clang has version18.x.xor above. Trysudo apt install clang libc++-dev libc++abi-dev.- When benchmark using
--gcpflag, you need to 1) Join the EAP program of Google AI Edge Portal; 2) Login to GCP usinggcloud auth login; 3) Set your GCP project using--gcp=<Your-GCP-Project>. - When
litert visualizefails to launch Model Explorer, try to runlitert visualize --stop-allfirst. - Exporting environment variable
LITERT_VERBOSE=1can enable verbose logging. litert cleanwill clean all local caches, like model files and binaries, which will free your disk space, and further, it will be very helpful for fixing complicated issues, like issues caused by NPU library version mismatch.
💡 Common commands
1. Download a model from Hugging Face Hub
# Download only .tflite files
litert download litert-community/MobileNet-v3-large \
--file "*.tflite" \
--output mobilenet
# Download full repository
litert download litert-community/MobileNet-v3-large \
--output mobilenet_full
# Download models using Hugging Face ID (uses HF ID as model reference too)
litert download litert-community/MobileNet-v3-large
# Download models with custom model reference
litert download litert-community/MobileNet-v3-large --model-ref my_model_ref
2. Convert a PyTorch model into a LiteRT model
# Automated HF Conversion
litert convert Qwen/Qwen1.5-0.5B-Chat --output /tmp/qwen
# Automated HF Conversion with INT4 Weight-Only Quantization
litert convert Qwen/Qwen1.5-0.5B-Chat --quantize-recipe weight_only_wi4_afp32 --output /tmp/qwen_w4
# Generic Script Injection with INT8 Dynamic Quantization
litert convert my_model.py --quantize-recipe dynamic_wi8_afp32 --output /tmp/mymodel
3. Quantize a LiteRT model
# Dynamic INT8 Quantization (Default)
litert quantize model.tflite \
--recipe dynamic_wi8_afp32 \
--output dynamic.tflite
# Weight-Only Quantization
litert quantize model.tflite \
--recipe weight_only_wi8_afp32 \
--output weight_only.tflite
# Static W8A8 Quantization (with calibration data)
litert quantize model.tflite \
--recipe static_wi8_ai8 \
--calibration-data calib_data.py \
--output static.tflite
# Custom Recipe
litert quantize model.tflite \
--custom-recipe quantize_recipe.json \
--output custom_quant.tflite
4. AOT Compile a LiteRT model for NPU
[!NOTE] Currently only support on Linux hosts and Qualcomm NPUs, and other NPU supports are coming soon!
# Basic compilation for specific Qualcomm NPU (e.g., sm8750)
litert compile model.tflite --target sm8750
# Compile for multiple targets and export an AI Pack for Android
litert compile model.tflite --target sm8750 --target mt6989 --export-aipack my_npu_models
5. Run a LiteRT model on desktop or Android
# Run locally on desktop (CPU)
litert run model.tflite --desktop --cpu
litert run my_model_ref --desktop --cpu
# Run with GPU acceleration and CPU fallback (multi-accelerator)
litert run model.tflite --gpu --cpu
litert run model.tflite --accelerator gpu,cpu
# Run on connected Android device
litert run model.tflite --android
# Run on connected Android device with NPU acceleration and CPU fallback
litert run model.tflite --android --npu --cpu
litert run model.tflite --android --accelerator npu,cpu
# Run on connected Android device with NPU AOT-compiled model
litert run model_sm8450.tflite --android --npu
# Run multiple iterations and print output tensors
litert run model.tflite \
--iterations 5 \
--print-tensors
# Run with custom input formats (supports image, raw binary, numpy array)
litert run model.tflite \
--input "image.png" \
--print-tensors
6. Benchmark a LiteRT model
# Benchmark on Android (CPU side)
litert benchmark my_model_ref --android --cpu
litert benchmark model.tflite --android --cpu
# Benchmark on Android NPU (JIT mode)
litert benchmark model.tflite --android --npu
# Benchmark AOT compiled model on Android NPU
litert benchmark model_sm8450.tflite --android --npu
# Benchmark on Android GPU
litert benchmark model.tflite --android --gpu
# Benchmark on macOS (CPU)
litert benchmark my_model_ref --desktop --cpu
# Benchmark on Google AI Edge Portal in Google Cloud. Prerequisites:
# - Set up your Google AI Edge Portal account by following up the instructions at:
# https://ai.google.dev/edge/ai-edge-portal
# - Set up authentication by running: gcloud auth login
# - You can set the default GCP project by setting the environment variable LITERT_GCP_PROJECT, or by providing the --gcp-project option.
# - You can specific your GCP bucket by --gcp-bucket, otherwise, it will create default
# one.
litert benchmark model.tflite --gcp --device "pixel 7" --gcp-project "your-gcp-project-id" --gcp-bucket "your-gcp-bucket"
litert benchmark model.tflite --gcp --devices "pixel 7, sm-s931u1" --gpu
7. Visualize a model's architecture
# Open in Model Explorer graph
litert visualize model.tflite
# Clean up and stop visualizer background servers
litert visualize --stop-all
8. Import a local model
# Import a local file into the centralized cache
litert import my_model.tflite --model-ref my_model
# Import a directory and associate with a Hugging Face ID
litert import ./my_model_dir --model-ref my_model --hf-id my_org_name/my_model
9. List managed models
# List all managed models
litert list
# Show detailed contents of a specific model using model reference.
litert list my_model
10. Delete a managed model
# Delete a model from cache
litert delete my_model
11. Run and benchmark a generative LLM model using LiteRT-LM CLI
litert lm command will utlitize litert-lm, and you can use the same command
with litert-lm, for example, both litert lm run and litert-lm run or
litert lm benchmark and litert-lm benchmark achieve the same results.
Please follow the LiteRT-LM CLI guide for detailed instructions.
# Run a generative LLM model, and load from hugging face
litert lm run \
--from-huggingface-repo=litert-community/gemma-4-E2B-it-litert-lm \
gemma-4-E2B-it.litertlm \
--prompt="What is the capital of France?"
# Or load from a local LLM model file
litert lm run ./my_model.litertlm
# Example with a custom prompt
litert lm run ./my_model.litertlm --prompt "Hello, how are you?"
# Benchmark a generative LLM model
litert lm benchmark ./my_model.litertlm
12. Clean up all caches
# Clean up local cache, like model files and binaries.
litert clean
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl.
File metadata
- Download URL: litert_cli_nightly-0.2.0.dev20260520-py3-none-any.whl
- Upload date:
- Size: 126.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7c1a1e6af29043dca2964513073213145e76807b6eeba2a6559fb545e201d5dd
|
|
| MD5 |
ef21bb63238669a7c93692941250695e
|
|
| BLAKE2b-256 |
0384c709365244d27bc06798c7f973340a049868bd0e447b554415aaa4c55b1a
|