Multi-engine LLM benchmark & monitoring CLI for Apple Silicon
Project description
asiai
Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI
asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.
Born from the OpenClaw project, where we needed hard data to pick the fastest engine for multi-agent swarms on Mac Mini M4 Pro.
Quick start
brew tap druide67/tap
brew install asiai
Or from source:
git clone https://github.com/druide67/asiai.git
cd asiai
pip install -e .
Commands
asiai detect
Auto-detect running inference engines across 5 ports.
$ asiai detect
Detected engines:
● ollama 0.17.4
URL: http://localhost:11434
● lmstudio 0.4.5
URL: http://localhost:1234
Running: 1 model(s)
- qwen3.5-35b-a3b MLX
asiai bench
Cross-engine benchmark with standardized prompts. Runs 3 iterations per prompt by default, reports median tok/s (SPEC standard) with stability classification.
$ asiai bench -m qwen3.5 --runs 3 --power
Mac Mini M4 Pro — Apple M4 Pro RAM: 64.0 GB (42% used) Pressure: normal
Benchmark: qwen3.5
Engine tok/s (±stddev) Tokens Duration TTFT VRAM Thermal
────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
lmstudio 72.6 ± 0.0 (stable) 435 6.20s 0.28s — nominal
ollama 30.4 ± 0.1 (stable) 448 15.28s 0.25s 26.0 GB nominal
Winner: lmstudio (2.4x faster)
Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)
Options:
-m, --model MODEL Model to benchmark (default: auto-detect)
-e, --engines LIST Filter engines (e.g. ollama,lmstudio,mlxlm)
-p, --prompts LIST Prompt types: code, tool_call, reasoning, long_gen
-r, --runs N Runs per prompt (default: 3, for median + stddev)
--power Measure GPU power via powermetrics (sudo required)
--context-size SIZE Context fill prompt: 4k, 16k, 32k, 64k
-H, --history PERIOD Show past benchmarks (e.g. 7d, 24h)
The runner resolves model names across engines automatically — gemma2:9b (Ollama) and gemma-2-9b (LM Studio) are matched as the same model.
asiai models
List loaded models across all engines.
$ asiai models
ollama http://localhost:11434
● qwen3.5:35b-a3b 26.0 GB Q4_K_M
lmstudio http://localhost:1234
● qwen3.5-35b-a3b MLX
asiai monitor
System and inference metrics snapshot, stored in SQLite.
$ asiai monitor
System
Uptime: 3d 12h
CPU Load: 2.45 / 3.12 / 2.89 (1m / 5m / 15m)
Memory: 45.2 GB / 64.0 GB 71%
Pressure: normal
Thermal: nominal (100%)
Inference ollama 0.17.4
Models loaded: 1 VRAM total: 26.0 GB
Model VRAM Format Quant
──────────────────────────────────────── ────────── ──────── ──────
qwen3.5:35b-a3b 26.0 GB gguf Q4_K_M
Options:
-w, --watch SEC Refresh every SEC seconds
-q, --quiet Collect and store without output (for daemon use)
-H, --history PERIOD Show history (e.g. 24h, 1h)
-a, --analyze HOURS Comprehensive analysis with trends
-c, --compare TS TS Compare two timestamps
asiai doctor
Diagnose installation, engines, system health, and database.
$ asiai doctor
Doctor
System
✓ Apple Silicon Mac Mini M4 Pro — Apple M4 Pro
✓ RAM 64 GB total, 42% used
✓ Memory pressure normal
✓ Thermal nominal (100%)
Engine
✓ Ollama v0.17.4 — 1 model(s): qwen3.5:35b-a3b
✓ LM Studio v0.4.5 — 1 model(s): qwen3.5-35b-a3b
✗ mlx-lm not installed
✗ llama.cpp not installed
✗ vllm-mlx not installed
Database
✓ SQLite 2.4 MB, last entry: 1m ago
5 ok, 0 warning(s), 3 failed
asiai daemon
Background monitoring via macOS launchd. Collects metrics every minute.
asiai daemon start # Install and start the daemon
asiai daemon start --interval 30 # Custom interval (seconds)
asiai daemon status # Check if running
asiai daemon logs # View recent logs
asiai daemon stop # Stop and uninstall
asiai web
Web dashboard with real-time monitoring, benchmark controls, and interactive charts. Requires pip install asiai[web].
asiai web # Opens browser at http://127.0.0.1:8899
asiai web --port 9000 # Custom port
asiai web --host 0.0.0.0 # Listen on all interfaces
asiai web --no-open # Don't auto-open browser
Features: system overview, engine status, live benchmark with SSE progress, history charts, doctor checks, dark/light theme.
asiai tui
Interactive terminal dashboard with auto-refresh. Requires pip install asiai[tui].
asiai tui
Supported engines
| Engine | Port | Install | API |
|---|---|---|---|
| Ollama | 11434 | brew install ollama |
Native |
| LM Studio | 1234 | brew install --cask lm-studio |
OpenAI-compatible |
| mlx-lm | 8080 | brew install mlx-lm |
OpenAI-compatible |
| llama.cpp | 8080 | brew install llama.cpp |
OpenAI-compatible |
| vllm-mlx | 8000 | pip install vllm-mlx |
OpenAI-compatible |
What it measures
| Metric | Description |
|---|---|
| tok/s | Generation speed (tokens/sec), excluding prompt processing (TTFT) |
| TTFT | Time to first token — prompt processing latency |
| Power | GPU power draw in watts (sudo powermetrics) |
| tok/s/W | Energy efficiency — tokens per second per watt |
| Stability | Run-to-run variance: stable (CV<5%), variable (<10%), unstable (>10%) |
| VRAM | GPU memory footprint (Ollama only) |
| Thermal | CPU throttling state and speed limit percentage |
All metrics stored in SQLite (~/.local/share/asiai/metrics.db) with 90-day retention and automatic regression detection.
Benchmark methodology
Following MLPerf, SPEC CPU 2017, and NVIDIA GenAI-Perf standards:
- Warmup: 1 non-timed generation per engine before measured runs
- Runs: 3 iterations per prompt (configurable), median as primary metric
- Sampling:
temperature=0(greedy decoding) for deterministic results - Power: Per-engine monitoring (not session-wide average)
- Variance: Pooled intra-prompt stddev (isolates run-to-run noise)
- Metadata: Engine version, model quantization, hardware chip, macOS version stored per result
See docs/benchmark-best-practices.md for the full conformance audit.
Benchmark prompts
Four standardized prompts test different generation patterns:
| Name | Tokens | Tests |
|---|---|---|
code |
512 | Structured code generation (BST in Python) |
tool_call |
256 | JSON function calling / instruction following |
reasoning |
384 | Multi-step math problem |
long_gen |
1024 | Sustained throughput (bash script) |
Use --context-size 4k|16k|32k|64k to test with large context fill prompts instead.
Requirements
- macOS on Apple Silicon (M1 / M2 / M3 / M4)
- Python 3.11+
- At least one inference engine running locally
Zero dependencies
The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich. Just stdlib.
Optional extras:
asiai[web]— FastAPI web dashboard with chartsasiai[tui]— Textual terminal dashboardasiai[all]— Web + TUIasiai[dev]— pytest, ruff
Roadmap
| Version | Scope | Status |
|---|---|---|
| v0.1 | detect + bench + monitor + models (CLI, stdlib) | Done |
| v0.2 | mlx-lm + doctor + daemon + TUI (Textual) | Done |
| v0.3 | 5 engines, power metrics, multi-run variance, regression detection | Done |
| v0.4 | CI, MkDocs, export JSON, thermal drift, web dashboard | Done |
| v1.0 | Multi-server, community export, Homebrew Core | Planned |
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asiai-0.4.0.tar.gz.
File metadata
- Download URL: asiai-0.4.0.tar.gz
- Upload date:
- Size: 140.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
73f0381a57557d8646032005b522c0b84676efe28c1def31a1bcbfe2df901290
|
|
| MD5 |
b00cbc2ce2a283660ca197f87608ca5b
|
|
| BLAKE2b-256 |
060663148caccea95b966d76379f47bb610ae85571f7d26f50637cf9f069626d
|
File details
Details for the file asiai-0.4.0-py3-none-any.whl.
File metadata
- Download URL: asiai-0.4.0-py3-none-any.whl
- Upload date:
- Size: 96.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
109ae4aa39d8c820ca8d9b5528443e76ea552cac86677b2db90209a808fde2b7
|
|
| MD5 |
56adf50f17f53a86d03ba97a0beea5e6
|
|
| BLAKE2b-256 |
319b3836a262d63e235616d7ad67769b2287611482e138178529c36a5c730a26
|