Multi-engine LLM benchmark & monitoring CLI for Apple Silicon

These details have not been verified by PyPI

Project links

Project description

asiai logo

asiai

Apple Silicon AI — Multi-engine LLM benchmark & monitoring CLI

asiai compares inference engines side-by-side on your Mac. Load the same model on Ollama and LM Studio, run asiai bench, get the numbers. No guessing, no vibes — just tok/s, TTFT, power efficiency, and stability per engine.

Born from the OpenClaw project, where we needed hard data to pick the fastest engine for multi-agent swarms on Mac Mini M4 Pro.

Quick start

brew tap druide67/tap
brew install asiai

Or from source:

git clone https://github.com/druide67/asiai.git
cd asiai
pip install -e .

Commands

`asiai detect`

Auto-detect running inference engines across 5 ports.

$ asiai detect

Detected engines:

  ● ollama 0.17.4
    URL: http://localhost:11434

  ● lmstudio 0.4.5
    URL: http://localhost:1234
    Running: 1 model(s)
      - qwen3.5-35b-a3b  MLX

`asiai bench`

Cross-engine benchmark with standardized prompts. Runs 3 iterations per prompt by default, reports median tok/s (SPEC standard) with stability classification.

$ asiai bench -m qwen3.5 --runs 3 --power

  Mac Mini M4 Pro — Apple M4 Pro  RAM: 64.0 GB (42% used)  Pressure: normal

Benchmark: qwen3.5

  Engine       tok/s (±stddev)    Tokens   Duration     TTFT       VRAM    Thermal
  ────────── ───────────────── ───────── ────────── ──────── ────────── ──────────
  lmstudio    72.6 ± 0.0 (stable)   435    6.20s    0.28s        —    nominal
  ollama      30.4 ± 0.1 (stable)   448   15.28s    0.25s   26.0 GB   nominal

  Winner: lmstudio (2.4x faster)
  Power: lmstudio 13.2W (5.52 tok/s/W) — ollama 16.0W (1.89 tok/s/W)

Options:

-m, --model MODEL          Model to benchmark (default: auto-detect)
-e, --engines LIST         Filter engines (e.g. ollama,lmstudio,mlxlm)
-p, --prompts LIST         Prompt types: code, tool_call, reasoning, long_gen
-r, --runs N               Runs per prompt (default: 3, for median + stddev)
    --power                Measure GPU power via powermetrics (sudo required)
    --context-size SIZE    Context fill prompt: 4k, 16k, 32k, 64k
-H, --history PERIOD       Show past benchmarks (e.g. 7d, 24h)

The runner resolves model names across engines automatically — gemma2:9b (Ollama) and gemma-2-9b (LM Studio) are matched as the same model.

`asiai models`

List loaded models across all engines.

$ asiai models

ollama  http://localhost:11434
  ● qwen3.5:35b-a3b                             26.0 GB Q4_K_M

lmstudio  http://localhost:1234
  ● qwen3.5-35b-a3b                                 MLX

`asiai monitor`

System and inference metrics snapshot, stored in SQLite.

$ asiai monitor

System
  Uptime:    3d 12h
  CPU Load:  2.45 / 3.12 / 2.89  (1m / 5m / 15m)
  Memory:    45.2 GB / 64.0 GB  71%
  Pressure:  normal
  Thermal:   nominal  (100%)

Inference  ollama 0.17.4
  Models loaded: 1  VRAM total: 26.0 GB

  Model                                        VRAM   Format  Quant
  ──────────────────────────────────────── ────────── ──────── ──────
  qwen3.5:35b-a3b                            26.0 GB     gguf Q4_K_M

Options:

-w, --watch SEC            Refresh every SEC seconds
-q, --quiet                Collect and store without output (for daemon use)
-H, --history PERIOD       Show history (e.g. 24h, 1h)
-a, --analyze HOURS        Comprehensive analysis with trends
-c, --compare TS TS        Compare two timestamps

`asiai doctor`

Diagnose installation, engines, system health, and database.

$ asiai doctor

Doctor

  System
    ✓ Apple Silicon       Mac Mini M4 Pro — Apple M4 Pro
    ✓ RAM                 64 GB total, 42% used
    ✓ Memory pressure     normal
    ✓ Thermal             nominal (100%)

  Engine
    ✓ Ollama              v0.17.4 — 1 model(s): qwen3.5:35b-a3b
    ✓ LM Studio           v0.4.5 — 1 model(s): qwen3.5-35b-a3b
    ✗ mlx-lm              not installed
    ✗ llama.cpp            not installed
    ✗ vllm-mlx            not installed

  Database
    ✓ SQLite              2.4 MB, last entry: 1m ago

  5 ok, 0 warning(s), 3 failed

`asiai daemon`

Background monitoring via macOS launchd. Collects metrics every minute.

asiai daemon start              # Install and start the daemon
asiai daemon start --interval 30  # Custom interval (seconds)
asiai daemon status             # Check if running
asiai daemon logs               # View recent logs
asiai daemon stop               # Stop and uninstall

`asiai web`

Web dashboard with real-time monitoring, benchmark controls, and interactive charts. Requires pip install asiai[web].

asiai web                    # Opens browser at http://127.0.0.1:8899
asiai web --port 9000        # Custom port
asiai web --host 0.0.0.0     # Listen on all interfaces
asiai web --no-open          # Don't auto-open browser

Features: system overview, engine status, live benchmark with SSE progress, history charts, doctor checks, dark/light theme.

`asiai tui`

Interactive terminal dashboard with auto-refresh. Requires pip install asiai[tui].

asiai tui

Supported engines

Engine	Port	Install	API
Ollama	11434	`brew install ollama`	Native
LM Studio	1234	`brew install --cask lm-studio`	OpenAI-compatible
mlx-lm	8080	`brew install mlx-lm`	OpenAI-compatible
llama.cpp	8080	`brew install llama.cpp`	OpenAI-compatible
vllm-mlx	8000	`pip install vllm-mlx`	OpenAI-compatible

What it measures

Metric	Description
tok/s	Generation speed (tokens/sec), excluding prompt processing (TTFT)
TTFT	Time to first token — prompt processing latency
Power	GPU power draw in watts (`sudo powermetrics`)
tok/s/W	Energy efficiency — tokens per second per watt
Stability	Run-to-run variance: stable (CV<5%), variable (<10%), unstable (>10%)
VRAM	GPU memory footprint (Ollama only)
Thermal	CPU throttling state and speed limit percentage

All metrics stored in SQLite (~/.local/share/asiai/metrics.db) with 90-day retention and automatic regression detection.

Benchmark methodology

Following MLPerf, SPEC CPU 2017, and NVIDIA GenAI-Perf standards:

Warmup: 1 non-timed generation per engine before measured runs
Runs: 3 iterations per prompt (configurable), median as primary metric
Sampling: temperature=0 (greedy decoding) for deterministic results
Power: Per-engine monitoring (not session-wide average)
Variance: Pooled intra-prompt stddev (isolates run-to-run noise)
Metadata: Engine version, model quantization, hardware chip, macOS version stored per result

See docs/benchmark-best-practices.md for the full conformance audit.

Benchmark prompts

Four standardized prompts test different generation patterns:

Name	Tokens	Tests
`code`	512	Structured code generation (BST in Python)
`tool_call`	256	JSON function calling / instruction following
`reasoning`	384	Multi-step math problem
`long_gen`	1024	Sustained throughput (bash script)

Use --context-size 4k|16k|32k|64k to test with large context fill prompts instead.

Requirements

macOS on Apple Silicon (M1 / M2 / M3 / M4)
Python 3.11+
At least one inference engine running locally

Zero dependencies

The core uses only the Python standard library — urllib, sqlite3, subprocess, argparse. No requests, no psutil, no rich. Just stdlib.

Optional extras:

asiai[web] — FastAPI web dashboard with charts
asiai[tui] — Textual terminal dashboard
asiai[all] — Web + TUI
asiai[dev] — pytest, ruff

Roadmap

Version	Scope	Status
v0.1	detect + bench + monitor + models (CLI, stdlib)	Done
v0.2	mlx-lm + doctor + daemon + TUI (Textual)	Done
v0.3	5 engines, power metrics, multi-run variance, regression detection	Done
v0.4	CI, MkDocs, export JSON, thermal drift, web dashboard	Done
v1.0	Multi-server, community export, Homebrew Core	Planned

License

Apache 2.0

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.5.0

Mar 31, 2026

1.4.1

Mar 30, 2026

1.4.0

Mar 28, 2026

1.3.0

Mar 26, 2026

1.2.0

Mar 23, 2026

1.1.1

Mar 22, 2026

1.1.0

Mar 22, 2026

1.0.1

Mar 14, 2026

1.0.0

Mar 9, 2026

0.7.0

Mar 8, 2026

0.6.0

Mar 7, 2026

0.5.1

Mar 6, 2026

0.5.0

Mar 6, 2026

This version

0.4.0

Mar 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asiai-0.4.0.tar.gz (140.3 kB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asiai-0.4.0-py3-none-any.whl (96.9 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file asiai-0.4.0.tar.gz.

File metadata

Download URL: asiai-0.4.0.tar.gz
Upload date: Mar 6, 2026
Size: 140.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for asiai-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`73f0381a57557d8646032005b522c0b84676efe28c1def31a1bcbfe2df901290`
MD5	`b00cbc2ce2a283660ca197f87608ca5b`
BLAKE2b-256	`060663148caccea95b966d76379f47bb610ae85571f7d26f50637cf9f069626d`

See more details on using hashes here.

File details

Details for the file asiai-0.4.0-py3-none-any.whl.

File metadata

Download URL: asiai-0.4.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 96.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for asiai-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`109ae4aa39d8c820ca8d9b5528443e76ea552cac86677b2db90209a808fde2b7`
MD5	`56adf50f17f53a86d03ba97a0beea5e6`
BLAKE2b-256	`319b3836a262d63e235616d7ad67769b2287611482e138178529c36a5c730a26`

See more details on using hashes here.

asiai 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

asiai

Quick start

Commands

asiai detect

asiai bench

asiai models

asiai monitor

asiai doctor

asiai daemon

asiai web

asiai tui

Supported engines

What it measures

Benchmark methodology

Benchmark prompts

Requirements

Zero dependencies

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`asiai detect`

`asiai bench`

`asiai models`

`asiai monitor`

`asiai doctor`

`asiai daemon`

`asiai web`

`asiai tui`