Skip to main content

Unified local inference API server for LLM agents on Apple Silicon (MLX): chat, embedding, image, music, sound-effect, and video — one process, one global memory budget, one single-flight queue, unified jobs and files.

Project description

kiapi

License: MIT Python Platform API Docs

English | 日本語

Summary

kiapi is an API server for providing the following capabilities to LLM agents on a Mac Studio M4 Max 128GB.

  • Chat:
    • OpenAI Chat Completions API compatible
    • text + image + audio + video input support
    • tool call + tool choice (auto, any, specific) + parallel tool calls + streaming support
  • Embedding:
    • text + image input support
  • Image generation:
    • text2image, image2image, image editing, and LoRA training support
  • Music and sound-effect generation:
    • text2audio, cover, repaint, and extract support
  • Video generation:
    • text2video, image2video, and audio2video support
  • Web:
    • search + fetch support

To provide every capability stably from a single PC, kiapi has these properties.

  • GPU work is queued and executed one job at a time
  • Application memory is managed to avoid overcommit failures

kiapi is also designed so LLM agents can understand and operate its capabilities.

  • The API server can explain how to use itself to an LLM
  • Asynchronous task progress can be observed
  • Generation tasks can run in both sync / async modes

[!IMPORTANT]

kiapi itself is MIT-licensed OSS, but the packages and models it provides have various licenses. Before use, check the dependency packages and models on each capability page and confirm the license of the model you use.

Model and Dependency Licenses

The table below summarizes the upstream licenses for the default models and runtime resources that kiapi can activate. It is a convenience checklist, not legal advice. License labels and gating status can change upstream, so always check the linked source before commercial use, redistribution, or offering a hosted service.

Review date: 2026-06-23.

Domain Family Resource Kind Upstream license Notes
chat chat mlx-community/Qwen3-Omni-30B-A3B-Instruct-4bit model weights Apache-2.0 MLX-converted Qwen3 Omni model.
chat chat mlx-community/Qwen3.6-27B-4bit model weights Apache-2.0 MLX-converted Qwen3.6 model.
embedding embedding mlx-community/Qwen3-Embedding-8B-mxfp8 model weights Apache-2.0 Text embedding model.
embedding embedding mlx-community/Qwen3-VL-Embedding-2B-mxfp8 model weights Apache-2.0 Text + image embedding model.
image zimage filipstrand/Z-Image-Turbo-mflux-4bit model weights Tongyi Qianwen License Quantized MLX-compatible Z-Image Turbo; inherits the original Z-Image Turbo license.
image zimage Tongyi-MAI/Z-Image model weights Apache-2.0 Base Z-Image model.
image flux2 black-forest-labs/FLUX.2-klein-9B model weights FLUX Non-Commercial License Gated upstream model. Confirm terms before any commercial use.
image flux2 black-forest-labs/FLUX.2-klein-base-4B model weights Apache-2.0 Open-weight FLUX.2 Klein Base 4B variant.
image flux2 black-forest-labs/FLUX.2-klein-base-9B model weights FLUX Non-Commercial License Gated upstream model. Confirm terms before any commercial use.
image qwen Qwen/Qwen-Image model weights Apache-2.0 Text-to-image model.
image qwen Qwen/Qwen-Image-Edit-2509 model weights Apache-2.0 Image editing model.
image ideogram4 ideogram-ai/ideogram-4-fp8 model weights Ideogram Non-Commercial Model Agreement Gated upstream model. Confirm hosted-service and commercial-use terms.
image ernie baidu/ERNIE-Image-Turbo model weights Apache-2.0 Turbo ERNIE-Image variant.
image ernie baidu/ERNIE-Image model weights Apache-2.0 Base ERNIE-Image variant.
image seedvr2 numz/SeedVR2_comfyUI model weights Apache-2.0 SeedVR2 3B and 7B upscaling checkpoints.
image depthpro apple/ml-depth-pro / depth_pro.pt code + model file Apple custom license GitHub reports NOASSERTION; review Apple's license text before redistribution or commercial use.
audio acestep ace-step/ACE-Step-1.5 Python package MIT Installed into the ACE-Step dedicated venv.
audio acestep ACE-Step/Ace-Step1.5 shared checkpoints MIT Shared ACE-Step 1.5 checkpoint resources.
audio acestep ACE-Step/acestep-v15-xl-base model weights MIT Extra XL base checkpoint used by xl-base.
audio audiogen facebook/audiogen-medium model weights CC-BY-NC-4.0 Non-commercial license.
video ltx2 Blaizzy/mlx-video Python package MIT Installed from a pinned Git commit for LTX-2 inference.
video ltx2 prince-canuma/LTX-2-distilled model weights Not declared upstream The model card has no license metadata; verify rights before use.
web web searxng/searxng / searxng/searxng:latest Docker image AGPL-3.0 Web search backend. AGPL obligations can matter for network services.
web web unclecode/crawl4ai / unclecode/crawl4ai:latest Docker image Apache-2.0 Web fetch backend.

API

Domain Family Endpoint Description
chat POST /v1/chat Chat API details
embedding POST /v1/embedding Embedding API details
image zimage POST /v1/image/zimage Z-Image API details
flux2 POST /v1/image/flux2 FLUX.2 API details
qwen POST /v1/image/qwen Qwen Image API details
ideogram4 POST /v1/image/ideogram4 Ideogram 4 API details
ernie POST /v1/image/ernie ERNIE-Image API details
seedvr2 POST /v1/image/seedvr2 SeedVR2 API details
depthpro POST /v1/image/depthpro Depth Pro API details
audio acestep POST /v1/audio/acestep ACE-Step API details
audiogen POST /v1/audio/audiogen AudioGen API details
video ltx2 POST /v1/video/ltx2 LTX-2 API details
web POST /v1/web Web API details
core files POST /v1/files Upload input files, LoRA adapters, and other files, then issue a file_id.
GET /v1/files Return a list of stored files.
GET /v1/files/{file_id} Return file metadata.
GET /v1/files/{file_id}/download Download the file body.
DELETE /v1/files/{file_id} Delete a stored file.
jobs GET /v1/jobs Return a list of generation jobs.
GET /v1/jobs/{job_id} Return job status, progress, result, and artifact file_ids.
DELETE /v1/jobs/{job_id} Remove a job from the job store. Running jobs are not interrupted.
openapi GET /openapi.json Return the common API and each capability documentation URL.
GET /v1/{domain}/{family}/openapi.json Return detailed input/output specs, usage, tips, and examples for each family.
health GET /health Return server status, warmup status, queue length, and memory usage.

API Docs

Requirements

  • macOS / Apple Silicon
  • Python >=3.12,<3.13
  • uv (optional, recommended for isolated CLI installs and faster venv/package setup in kiapi activate)
  • mise (used for development)
  • Docker (when using the Web capability)
  • Enough disk capacity for model weights and Docker images

kiapi is developed mainly for personal use on a Mac Studio M4 Max 128GB. Some or all features may work on other Apple Silicon environments, but they are not the primary verification target.

The memory budget can be specified with KIAPI_MEMORY_LIMIT_GB. If omitted, kiapi automatically uses 80% of installed memory as the effective budget on startup. If a model's required memory does not fit in that budget, requests return 503 as an insufficient memory budget error.

kiapi activate --all uses a little under 600GB of disk capacity, including model weights and Docker images. At first, it is recommended to use kiapi activate to set up only the capabilities you need.

Quick Start

From installation to agent integration:

# Install kiapi itself
python3.12 -m pip install --upgrade kiapi  # when uv is unavailable
uv tool install --python 3.12 kiapi  # when uv is available

# Change default host, port, or memory budget if needed
kiapi config init
kiapi config edit

# Check setup status
kiapi status

# Explicit setup for model weights, Docker images, and dedicated venvs
kiapi activate  # select targets from the displayed list
kiapi activate --all  # set up everything (a little under 600GB)
kiapi activate --family acestep  # set up only one family

# Verify behavior
kiapi check  # select targets from the displayed list
kiapi check --all  # check everything

# Start the API server
kiapi run  # starts on 127.0.0.1:8000
kiapi run --host 0.0.0.0 --port 8500  # specify host and port

# Example agent integration
codex e "
Please understand http://localhost:8000/openapi.json.
Using the music generation API, generate a 20-second BGM themed 'a person walking in the rain' at ~/Downloads/bgm.wav.
"

# Check the generated file
open ~/Downloads/bgm.wav

Run as a background service:

# Register the service
kiapi service install

# Start the service
kiapi service start

# Check service status and log tail
kiapi service status

# Stop the service
kiapi service stop

# Remove the service
kiapi service uninstall

Architecture

[!NOTE]

For kiapi architecture details, see ARCHITECTURE.md.

Local Storage

kiapi mainly writes to these local paths at runtime.

Purpose Setting Default Notes
Files API uploads, generated artifacts, and URL/data URL inputs KIAPI_FILES_ROOT /tmp/kiapi/files Storage referenced by file_id. The default may disappear after OS reboot or tmp cleanup. Use ~/.kiapi/files or external storage for long-term retention.
Temporary working directories during request processing KIAPI_TMP_ROOT /tmp/kiapi/work Used for chat/embedding input expansion, generation intermediates, LoRA training work, and similar tasks.
Web backend subprocess logs KIAPI_WEB_BACKEND_LOG_DIR /tmp/kiapi/logs/web stdout/stderr for SearXNG / Crawl4AI Docker subprocesses.
ACE-Step dedicated venv / project / checkpoints KIAPI_ACESTEP_PYTHON_PATH, KIAPI_ACESTEP_PROJECT_ROOT, KIAPI_ACESTEP_CHECKPOINT_DIR KIAPI_USER_DATA_DIR or acestep/ under the platformdirs user data dir When python_path, project_root, and checkpoint_dir are omitted, kiapi places the ACE-Step venv and checkpoints under a persistent ACE-Step directory.

Other model weights and library caches are managed by Hugging Face, mflux, Docker, or each library/tool. kiapi generally does not move them into its own storage location.

Project Status

kiapi is OSS developed mainly for personal use. The API, supported models, and setup steps may change in the future.

Issues and Pull Requests are welcome, but this is a personal project and support is best-effort.

Security

By default, kiapi run starts on 127.0.0.1:8000. When --host 0.0.0.0 is specified, the server may be reachable from other machines, so use it only on trusted networks.

Development

# Install dependencies, download test data, and create the venv environment
make init

# Sync dependencies
make update

# Upgrade dependencies
make upgrade

# ... implement

# Format, type-check, and regenerate documentation under public/
make

# unit test
make test

# Start the development server (auto-reload supported)
make dev

# GPU feature tests / regression tests
make verify  # run all
make verify-fast  # run only light tests for all capabilities
make verify-one  # run one capability

Release

kiapi releases follow the same flow as pydantic-settings-manager: update the version, update the changelog, then push a tag that triggers GitHub Release and PyPI publishing.

# Update the version and release entry in CHANGELOG.md
make bump-version

# Or pass the version explicitly
mise run bump-version 0.2.0

# Local verification
make test
make
make build

# Release commit and tag
git add pyproject.toml CHANGELOG.md
git commit -m "chore(release): prepare v0.2.0"
git tag v0.2.0
git push origin main --tags

When a v*.*.* tag is pushed, the GitHub Actions release workflow builds the package, extracts release notes from CHANGELOG.md, creates a GitHub Release, and publishes to PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kiapi-0.1.0.tar.gz (307.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kiapi-0.1.0-py3-none-any.whl (464.5 kB view details)

Uploaded Python 3

File details

Details for the file kiapi-0.1.0.tar.gz.

File metadata

  • Download URL: kiapi-0.1.0.tar.gz
  • Upload date:
  • Size: 307.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kiapi-0.1.0.tar.gz
Algorithm Hash digest
SHA256 2179d2e9a6d013251fdc8e4f7dcf791d47517b7f76fb31be93d08ad30761e73a
MD5 f441c657da14971aba4d1e26522cb2d8
BLAKE2b-256 ac07d1d6593b8ac1267b93900e2646156054568c70bc1299f8dd22c536214207

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiapi-0.1.0.tar.gz:

Publisher: release-pypi.yml on kiarina/kiapi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file kiapi-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: kiapi-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 464.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for kiapi-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78a963cf38ddcb969ebf71372bc011565502381e427bb5db49ab98408b922fb4
MD5 cefa6445cb40d27b8d22f2524bcb47f3
BLAKE2b-256 9db50f08cfc761b6857b452978f0369aca7500d51253ced8993079fd7d5b0b73

See more details on using hashes here.

Provenance

The following attestation bundles were made for kiapi-0.1.0-py3-none-any.whl:

Publisher: release-pypi.yml on kiarina/kiapi

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page