Fleet manager for local LLM inference engines on Apple Silicon

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

druide67

These details have not been verified by PyPI

Project links

Project description

asiai-inference-server

Fleet manager for local LLM inference engines on Apple Silicon.

Status: v0.0.1 pre-alpha — skeleton only. Not yet functional. See the roadmap below.

asiai-inference-server is the control plane companion to asiai (the observability/benchmark CLI). Where asiai observes what's running on your Mac, this project manages it: install, start, stop, unload, and orchestrate inference engines (llama.cpp, Ollama, LM Studio, oMLX, TurboQuant, mlx-lm, vMLX, …) across one or several Apple Silicon machines.

It also fixes the long-standing macOS pain point: engine memory that never gets freed because of the unified-memory compressor. Killing a process doesn't release the VRAM. This tool combines per-engine unload APIs, full LaunchDaemon restart, and sudo purge to reclaim memory deterministically — and reports the actual delta measured, not a marketing promise.

Why

After a year of running multi-engine LLM inference on Apple Silicon (MacBook M1 Max, Mac Mini M4 Pro, MacBook M5 Max), the operational gap became obvious:

Install/uninstall an engine should not require chasing brews, plists and firewall rules across READMEs.
Switching profiles ("coding agent on Qwen-Coder 32B" → "70B chat on TurboQuant") should be a single command, not five.
Memory unload should actually free the VRAM, not let the macOS compressor sit on it.
A fleet of Macs should be a single dashboard, not three SSH sessions.
AI agents (Claude Code, Cursor, Windsurf) should be able to manage the fleet via MCP, not just observe it.

asiai-inference-server ships these one at a time, building on the asiai observability stack.

Roadmap

Version	Scope	Status
v0.0	Repo skeleton + packaging	in progress
v0.1	Install/uninstall/start/stop + unload + purge memory	next
v0.2	Profile switching (TOML profiles, apply/rollback)	planned
v0.3	Fleet manager (multi-Mac inventory, SSH dispatch)	planned
v0.4	Web cockpit + optional HTTP agent	planned
v1.0	MCP write tools + PyPI/Homebrew release	planned

Architecture (high level)

CLI: aisctl <command> (standalone) or asiai engine <command> (auto-injected sub-CLI when asiai-inference-server is installed alongside asiai).
Python stdlib only for the core (cohérent avec asiai). Optional extras: mcp (for v1.0 write tools).
macOS Apple Silicon only. We rely on launchctl, vm_stat, sudo purge, pfctl, iogpu.wired_limit_mb.
SSH-first for fleet operations. Optional HTTP agent in v0.4 for agent-to-agent orchestration.
TOML for human-edited files (engine manifests, profiles, fleet inventory). JSON for runtime state.

The full design rationale (architecture diagram, sequencing, file map, risk mitigations) lives in the validated plan at ~/.claude/plans/iterative-wiggling-crystal.md.

License

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

druide67

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 29, 2026

This version

0.1.0

May 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asiai_inference_server-0.1.0.tar.gz (161.0 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asiai_inference_server-0.1.0-py3-none-any.whl (107.5 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file asiai_inference_server-0.1.0.tar.gz.

File metadata

Download URL: asiai_inference_server-0.1.0.tar.gz
Upload date: May 28, 2026
Size: 161.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asiai_inference_server-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cc7be26410920f0ee8e444a9b7121fe588894ac2c265709684f09b34af2d1570`
MD5	`5b96a5cf6297bf6d865e041233f68f5d`
BLAKE2b-256	`599b6d827912053dae8c7459c23007e9e8b4d98a1cdeeb0ba2ed281f311dbecc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asiai_inference_server-0.1.0.tar.gz:

Publisher: release.yml on druide67/asiai-inference-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asiai_inference_server-0.1.0.tar.gz
- Subject digest: cc7be26410920f0ee8e444a9b7121fe588894ac2c265709684f09b34af2d1570
- Sigstore transparency entry: 1659940182
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: druide67/asiai-inference-server@0c0807fcd579565ff895929a8d33567ed940e7fb
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/druide67
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0c0807fcd579565ff895929a8d33567ed940e7fb
- Trigger Event: push

File details

Details for the file asiai_inference_server-0.1.0-py3-none-any.whl.

File metadata

Download URL: asiai_inference_server-0.1.0-py3-none-any.whl
Upload date: May 28, 2026
Size: 107.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for asiai_inference_server-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c74eefa54b3fba2c1bc7824ec63cb0c768ac27eed3d5031c482531b31ad35fce`
MD5	`cb13be8d9719f643d4e01451820e88de`
BLAKE2b-256	`5f7b5bcf456490da1b663a370ae083e66f6e7adb3d07a6ebc59ea2a67efec3be`

See more details on using hashes here.

Provenance

The following attestation bundles were made for asiai_inference_server-0.1.0-py3-none-any.whl:

Publisher: release.yml on druide67/asiai-inference-server

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: asiai_inference_server-0.1.0-py3-none-any.whl
- Subject digest: c74eefa54b3fba2c1bc7824ec63cb0c768ac27eed3d5031c482531b31ad35fce
- Sigstore transparency entry: 1659940224
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: druide67/asiai-inference-server@0c0807fcd579565ff895929a8d33567ed940e7fb
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/druide67
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@0c0807fcd579565ff895929a8d33567ed940e7fb
- Trigger Event: push

asiai-inference-server 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

asiai-inference-server

Why

Roadmap

Architecture (high level)

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance