Skip to main content

Fleet manager for local LLM inference engines on Apple Silicon

Project description

asiai-inference-server

Fleet manager for local LLM inference engines on Apple Silicon.

Status: v0.0.1 pre-alpha — skeleton only. Not yet functional. See the roadmap below.

asiai-inference-server is the control plane companion to asiai (the observability/benchmark CLI). Where asiai observes what's running on your Mac, this project manages it: install, start, stop, unload, and orchestrate inference engines (llama.cpp, Ollama, LM Studio, oMLX, TurboQuant, mlx-lm, vMLX, …) across one or several Apple Silicon machines.

It also fixes the long-standing macOS pain point: engine memory that never gets freed because of the unified-memory compressor. Killing a process doesn't release the VRAM. This tool combines per-engine unload APIs, full LaunchDaemon restart, and sudo purge to reclaim memory deterministically — and reports the actual delta measured, not a marketing promise.

Why

After a year of running multi-engine LLM inference on Apple Silicon (MacBook M1 Max, Mac Mini M4 Pro, MacBook M5 Max), the operational gap became obvious:

  • Install/uninstall an engine should not require chasing brews, plists and firewall rules across READMEs.
  • Switching profiles ("coding agent on Qwen-Coder 32B" → "70B chat on TurboQuant") should be a single command, not five.
  • Memory unload should actually free the VRAM, not let the macOS compressor sit on it.
  • A fleet of Macs should be a single dashboard, not three SSH sessions.
  • AI agents (Claude Code, Cursor, Windsurf) should be able to manage the fleet via MCP, not just observe it.

asiai-inference-server ships these one at a time, building on the asiai observability stack.

Roadmap

Version Scope Status
v0.0 Repo skeleton + packaging in progress
v0.1 Install/uninstall/start/stop + unload + purge memory next
v0.2 Profile switching (TOML profiles, apply/rollback) planned
v0.3 Fleet manager (multi-Mac inventory, SSH dispatch) planned
v0.4 Web cockpit + optional HTTP agent planned
v1.0 MCP write tools + PyPI/Homebrew release planned

Architecture (high level)

  • CLI: aisctl <command> (standalone) or asiai engine <command> (auto-injected sub-CLI when asiai-inference-server is installed alongside asiai).
  • Python stdlib only for the core (cohérent avec asiai). Optional extras: mcp (for v1.0 write tools).
  • macOS Apple Silicon only. We rely on launchctl, vm_stat, sudo purge, pfctl, iogpu.wired_limit_mb.
  • SSH-first for fleet operations. Optional HTTP agent in v0.4 for agent-to-agent orchestration.
  • TOML for human-edited files (engine manifests, profiles, fleet inventory). JSON for runtime state.

The full design rationale (architecture diagram, sequencing, file map, risk mitigations) lives in the validated plan at ~/.claude/plans/iterative-wiggling-crystal.md.

License

Apache-2.0 © 2026 Jean-Marc Nahlovsky

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asiai_inference_server-0.2.0.tar.gz (164.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

asiai_inference_server-0.2.0-py3-none-any.whl (110.7 kB view details)

Uploaded Python 3

File details

Details for the file asiai_inference_server-0.2.0.tar.gz.

File metadata

  • Download URL: asiai_inference_server-0.2.0.tar.gz
  • Upload date:
  • Size: 164.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asiai_inference_server-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5442234a2d9779a6a4a22d424c282e1516ae97e810aeb5a1543bf022b95b7f57
MD5 fe4afc50f402a3178f3f49df685712b3
BLAKE2b-256 5994e2ac57b29335f7cefd41011ffbb98381baf47f3edbc1b965ca504d2896c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for asiai_inference_server-0.2.0.tar.gz:

Publisher: release.yml on druide67/asiai-inference-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file asiai_inference_server-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for asiai_inference_server-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e683fea9601328e367f230a9b247b4735fb0f31433a507a9f01c789a6387f1b7
MD5 395a429c4a364a9a422d0be084b130ed
BLAKE2b-256 df15cd97f468884968b90382e2ca618479938bdbcb8610830d7a4a316a5e6db2

See more details on using hashes here.

Provenance

The following attestation bundles were made for asiai_inference_server-0.2.0-py3-none-any.whl:

Publisher: release.yml on druide67/asiai-inference-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page