Fleet manager for local LLM inference engines on Apple Silicon
Project description
asiai-inference-server
Fleet manager for local LLM inference engines on Apple Silicon.
Status: v0.0.1 pre-alpha — skeleton only. Not yet functional. See the roadmap below.
asiai-inference-server is the control plane companion to
asiai (the observability/benchmark CLI). Where asiai
observes what's running on your Mac, this project manages it: install,
start, stop, unload, and orchestrate inference engines (llama.cpp,
Ollama, LM Studio, oMLX, TurboQuant, mlx-lm, vMLX, …) across one or
several Apple Silicon machines.
It also fixes the long-standing macOS pain point: engine memory that never
gets freed because of the unified-memory compressor. Killing a process
doesn't release the VRAM. This tool combines per-engine unload APIs, full
LaunchDaemon restart, and sudo purge to reclaim memory deterministically —
and reports the actual delta measured, not a marketing promise.
Why
After a year of running multi-engine LLM inference on Apple Silicon (MacBook M1 Max, Mac Mini M4 Pro, MacBook M5 Max), the operational gap became obvious:
- Install/uninstall an engine should not require chasing brews, plists and firewall rules across READMEs.
- Switching profiles ("coding agent on Qwen-Coder 32B" → "70B chat on TurboQuant") should be a single command, not five.
- Memory unload should actually free the VRAM, not let the macOS compressor sit on it.
- A fleet of Macs should be a single dashboard, not three SSH sessions.
- AI agents (Claude Code, Cursor, Windsurf) should be able to manage the fleet via MCP, not just observe it.
asiai-inference-server ships these one at a time, building on the
asiai observability stack.
Roadmap
| Version | Scope | Status |
|---|---|---|
| v0.0 | Repo skeleton + packaging | in progress |
| v0.1 | Install/uninstall/start/stop + unload + purge memory | next |
| v0.2 | Profile switching (TOML profiles, apply/rollback) | planned |
| v0.3 | Fleet manager (multi-Mac inventory, SSH dispatch) | planned |
| v0.4 | Web cockpit + optional HTTP agent | planned |
| v1.0 | MCP write tools + PyPI/Homebrew release | planned |
Architecture (high level)
- CLI:
aisctl <command>(standalone) orasiai engine <command>(auto-injected sub-CLI whenasiai-inference-serveris installed alongsideasiai). - Python stdlib only for the core (cohérent avec asiai). Optional
extras:
mcp(for v1.0 write tools). - macOS Apple Silicon only. We rely on
launchctl,vm_stat,sudo purge,pfctl,iogpu.wired_limit_mb. - SSH-first for fleet operations. Optional HTTP agent in v0.4 for agent-to-agent orchestration.
- TOML for human-edited files (engine manifests, profiles, fleet inventory). JSON for runtime state.
The full design rationale (architecture diagram, sequencing, file map,
risk mitigations) lives in the validated plan at
~/.claude/plans/iterative-wiggling-crystal.md.
License
Apache-2.0 © 2026 Jean-Marc Nahlovsky
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file asiai_inference_server-0.2.0.tar.gz.
File metadata
- Download URL: asiai_inference_server-0.2.0.tar.gz
- Upload date:
- Size: 164.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5442234a2d9779a6a4a22d424c282e1516ae97e810aeb5a1543bf022b95b7f57
|
|
| MD5 |
fe4afc50f402a3178f3f49df685712b3
|
|
| BLAKE2b-256 |
5994e2ac57b29335f7cefd41011ffbb98381baf47f3edbc1b965ca504d2896c9
|
Provenance
The following attestation bundles were made for asiai_inference_server-0.2.0.tar.gz:
Publisher:
release.yml on druide67/asiai-inference-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asiai_inference_server-0.2.0.tar.gz -
Subject digest:
5442234a2d9779a6a4a22d424c282e1516ae97e810aeb5a1543bf022b95b7f57 - Sigstore transparency entry: 1673151711
- Sigstore integration time:
-
Permalink:
druide67/asiai-inference-server@9637256148d2239a7e8763d3d4b4e7f21b43a8ed -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/druide67
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9637256148d2239a7e8763d3d4b4e7f21b43a8ed -
Trigger Event:
push
-
Statement type:
File details
Details for the file asiai_inference_server-0.2.0-py3-none-any.whl.
File metadata
- Download URL: asiai_inference_server-0.2.0-py3-none-any.whl
- Upload date:
- Size: 110.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e683fea9601328e367f230a9b247b4735fb0f31433a507a9f01c789a6387f1b7
|
|
| MD5 |
395a429c4a364a9a422d0be084b130ed
|
|
| BLAKE2b-256 |
df15cd97f468884968b90382e2ca618479938bdbcb8610830d7a4a316a5e6db2
|
Provenance
The following attestation bundles were made for asiai_inference_server-0.2.0-py3-none-any.whl:
Publisher:
release.yml on druide67/asiai-inference-server
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
asiai_inference_server-0.2.0-py3-none-any.whl -
Subject digest:
e683fea9601328e367f230a9b247b4735fb0f31433a507a9f01c789a6387f1b7 - Sigstore transparency entry: 1673151753
- Sigstore integration time:
-
Permalink:
druide67/asiai-inference-server@9637256148d2239a7e8763d3d4b4e7f21b43a8ed -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/druide67
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9637256148d2239a7e8763d3d4b4e7f21b43a8ed -
Trigger Event:
push
-
Statement type: