Self-hostable OpenAI-compatible multi-modality AI server: 20 modalities (chat, image, audio, video, 3D, embeddings, OCR, segmentation), plugin runtimes (PyTorch, diffusers, llama-cpp-python).

These details have not been verified by PyPI

Project links

Project description

Muse

fresh-venv-smoke

Model-agnostic multi-modality generation server. OpenAI-compatible HTTP is the canonical interface:

text-to-speech on /v1/audio/speech
speech-to-text on /v1/audio/transcriptions and /v1/audio/translations
text-to-music on /v1/audio/music and text-to-sound-effects on /v1/audio/sfx
text-to-image on /v1/images/generations, image inpainting on /v1/images/edits, image variations on /v1/images/variations
image-to-image super-resolution on /v1/images/upscale
promptable segmentation on /v1/images/segment
text-to-animation on /v1/images/animations
text-to-video on /v1/video/generations
image-to-vector on /v1/images/embeddings
audio-to-vector on /v1/audio/embeddings
text-to-vector on /v1/embeddings
text-to-text (LLM, tool calls, streaming) on /v1/chat/completions
text moderation/classification on /v1/moderations
text rerank (Cohere-compat) on /v1/rerank
text summarization (Cohere-compat) on /v1/summarize

Modality tags are MIME-style (audio/embedding, audio/generation, audio/speech, audio/transcription, chat/completion, embedding/text, image/animation, image/embedding, image/generation, image/segmentation, image/upscale, text/classification, text/rerank, text/summarization, video/generation).

Three ways to add a model, in order of how often you'll reach for them:

Pull a GGUF or sentence-transformers model from HuggingFace by URI. No script, no edits:

muse search qwen3 --modality chat/completion --max-size-gb 10
muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_m

Drop a .py script into ~/.muse/models/ for a one-off model with custom code (see docs/MODEL_SCRIPTS.md).
Add a whole new modality (rare) by dropping a subpackage into src/muse/modalities/ or $MUSE_MODALITIES_DIR. The subpackage exports MODALITY + build_router and discovery picks it up. Optional: drop a hf.py next to __init__.py exporting an HF_PLUGIN dict; muse's HF resolver picks it up the same way and muse search/muse pull hf://... work for the new modality.

All three surfaces are discovered at runtime; there is no hardcoded catalog, no allowlist, and no registration calls.

The CLI is deliberately admin-only (serve, pull, search, models). Generation is reached via the HTTP API, consumed by Python clients, curl, or future wrappers like muse mcp.

Install

pip install -e ".[server,audio,images]"

Optional extras:

audio: PyTorch + transformers for TTS backends
audio-kokoro: Kokoro TTS (needs system espeak-ng)
images: diffusers + Pillow for SD-Turbo and future image backends
server: FastAPI + uvicorn + sse-starlette (only needed on the serving host)
dev: pytest + coverage tools

Quick start

# Pull bundled models by id (creates a dedicated venv + installs deps + downloads weights)
muse pull soprano-80m
muse pull sd-turbo

# Or pull anything resolvable from HuggingFace by URI
muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_m
muse pull hf://sentence-transformers/all-MiniLM-L6-v2

# Admin: list what's in the catalog
muse models list

# Start the server (instant boot; serves OpenAI-compatible endpoints).
# As of v0.40.0 muse is lazy-load: enabled models stay on disk until
# the first request that names them, then spawn a worker on demand.
muse serve --host 0.0.0.0 --port 8000

# Optional: pre-warm a model so the first real request is hot
muse models warmup soprano-80m

From any client, generation is an HTTP call:

# Text-to-speech
curl -X POST http://localhost:8000/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{"input":"Hello world","model":"soprano-80m"}' \
  --output hello.wav

# Embeddings (accepts single string or list)
curl -X POST http://localhost:8000/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{"input":"hello world","model":"all-minilm-l6-v2"}'

# Image embeddings (input is data: URL or http(s):// URL; mirrors /v1/embeddings)
IMG_B64=$(base64 -w0 cat.png)
curl -X POST http://localhost:8000/v1/images/embeddings \
  -H "Content-Type: application/json" \
  -d "{\"input\":\"data:image/png;base64,${IMG_B64}\",\"model\":\"dinov2-small\"}"

# Audio embeddings (multipart upload; one or more `file` parts; mirrors /v1/embeddings envelope)
curl -X POST http://localhost:8000/v1/audio/embeddings \
  -F "file=@clip.wav" \
  -F "model=mert-v1-95m"

# Chat (OpenAI-compatible incl. tools and streaming)
curl -X POST http://localhost:8000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen3-8b-gguf-q4-k-m","messages":[{"role":"user","content":"Capital of France?"}]}'

# Rerank (Cohere-compat); pulls bge-reranker-v2-m3 by default
curl -X POST http://localhost:8000/v1/rerank \
  -H "Content-Type: application/json" \
  -d '{
    "query": "what is muse?",
    "documents": [
      "muse is an audio server",
      "muse is a multi-modality generation server",
      "muse is the goddess of inspiration"
    ],
    "model": "bge-reranker-v2-m3",
    "top_n": 2,
    "return_documents": true
  }'

# Summarize (Cohere-compat); pulls bart-large-cnn by default
curl -X POST http://localhost:8000/v1/summarize \
  -H "Content-Type: application/json" \
  -d '{
    "text": "muse is a model-agnostic multi-modality generation server. It hosts text, image, audio, and video models behind a unified HTTP API that mirrors OpenAI where possible.",
    "length": "short",
    "format": "paragraph",
    "model": "bart-large-cnn"
  }'

# Music generation (capability-gated; default model: stable-audio-open-1.0)
curl -X POST http://localhost:8000/v1/audio/music \
  -H "Content-Type: application/json" \
  -d '{"prompt":"ambient piano with light rain","model":"stable-audio-open-1.0","duration":10.0}' \
  --output music.wav

# Sound effects generation (same model, different intent)
curl -X POST http://localhost:8000/v1/audio/sfx \
  -H "Content-Type: application/json" \
  -d '{"prompt":"footsteps on gravel","model":"stable-audio-open-1.0","duration":3.0}' \
  --output footsteps.wav

# Image inpainting (multipart: image + mask + prompt)
# White mask pixels are regenerated; black pixels are kept.
curl -X POST http://localhost:8000/v1/images/edits \
  -F "image=@scene.png" \
  -F "mask=@mask.png" \
  -F "prompt=add a moon to the sky" \
  -F "model=sd-turbo" \
  -F "size=512x512" \
  -F "n=1"

# Image variations (multipart: image only, no prompt)
curl -X POST http://localhost:8000/v1/images/variations \
  -F "image=@scene.png" \
  -F "model=sd-turbo" \
  -F "size=512x512" \
  -F "n=2"

# Image upscale (multipart: 4x super-resolution; SD x4 supports scale=4 only)
curl -s -X POST http://localhost:8000/v1/images/upscale \
  -F "image=@source.png" \
  -F "model=stable-diffusion-x4-upscaler" \
  -F "scale=4" \
  -F "prompt=high detail" \
  | jq -r '.data[0].b64_json' \
  | base64 -d > upscaled.png

# Image segmentation (multipart: SAM-2 promptable masks)
# Mode 1: automatic (sweep grid of point prompts internally)
curl -s -X POST http://localhost:8000/v1/images/segment \
  -F "image=@scene.png" \
  -F "model=sam2-hiera-tiny" \
  -F "mode=auto" \
  -F "max_masks=8"

# Mode 2: foreground click points
curl -s -X POST http://localhost:8000/v1/images/segment \
  -F "image=@scene.png" \
  -F "model=sam2-hiera-tiny" \
  -F "mode=points" \
  -F 'points=[[150, 200]]'

# Mode 3: bounding boxes
curl -s -X POST http://localhost:8000/v1/images/segment \
  -F "image=@scene.png" \
  -F "model=sam2-hiera-tiny" \
  -F "mode=boxes" \
  -F 'boxes=[[50, 60, 250, 240]]' \
  -F "mask_format=rle"

# Video generation (since v0.27.0; GPU-required, 8GB+ VRAM tight)
# Default response_format=mp4; "webm" and "frames_b64" also supported.
curl -s -X POST http://localhost:8000/v1/video/generations \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "a flag waving in the wind",
    "model": "wan2-1-t2v-1-3b",
    "duration_seconds": 5.0,
    "fps": 5,
    "size": "832x480",
    "steps": 30
  }' \
  | jq -r '.data[0].b64_json' \
  | base64 -d > flag.mp4

from muse.modalities.audio_speech import SpeechClient
from muse.modalities.image_generation import (
    GenerationsClient, ImageEditsClient, ImageVariationsClient,
)
from muse.modalities.embedding_text import EmbeddingsClient
from muse.modalities.chat_completion import ChatClient

# MUSE_SERVER env var sets the base URL for remote use; default http://localhost:8000
wav_bytes = SpeechClient().infer("Hello world")
pngs = GenerationsClient().generate("a cat on mars, cinematic", n=1)
vectors = EmbeddingsClient().embed(["alpha", "beta"])   # list[list[float]]
chat = ChatClient().chat(
    model="qwen3-8b-gguf-q4-k-m",
    messages=[{"role": "user", "content": "Capital of France?"}],
)

# Image inpainting and variations (since v0.21.0)
src = open("scene.png", "rb").read()
msk = open("mask.png", "rb").read()
edited = ImageEditsClient().edit(
    "add a moon to the sky", image=src, mask=msk, model="sd-turbo",
)
variants = ImageVariationsClient().vary(image=src, model="sd-turbo", n=2)

# Image upscale (since v0.25.0): 4x super-resolution
from muse.modalities.image_upscale import ImageUpscaleClient
from pathlib import Path
upscaled = ImageUpscaleClient().upscale(
    image=Path("source.png").read_bytes(),
    model="stable-diffusion-x4-upscaler",
    scale=4,
    prompt="razor sharp detail",
)
Path("upscaled.png").write_bytes(upscaled[0])

# Image segmentation (since v0.26.0): SAM-2 promptable masks
from muse.modalities.image_segmentation import ImageSegmentationClient
seg = ImageSegmentationClient()
src_bytes = Path("scene.png").read_bytes()
result_auto = seg.segment(
    image=src_bytes, model="sam2-hiera-tiny", mode="auto", max_masks=8,
)
result_points = seg.segment(
    image=src_bytes, model="sam2-hiera-tiny", mode="points",
    points=[[150, 200]],
)
result_boxes = seg.segment(
    image=src_bytes, model="sam2-hiera-tiny", mode="boxes",
    boxes=[[50, 60, 250, 240]], mask_format="rle",
)
# Each result is a dict {id, model, mode, image_size, masks: [...]}
# masks[i]["mask"] is a base64 PNG (mask_format=png_b64) or
# a {"size": [H, W], "counts": str} dict (mask_format=rle)

# Video generation (since v0.27.0): GPU-required, 8GB+ VRAM tight
# Wan2.1 T2V 1.3B (~3GB at fp16) is the default low-VRAM bundle;
# CogVideoX-2b (~9GB) and LTX-Video (~16GB) are curated additions.
from muse.modalities.video_generation import VideoGenerationClient
vid = VideoGenerationClient()
mp4_bytes = vid.generate(
    "a flag waving in the wind",
    model="wan2-1-t2v-1-3b",
    duration_seconds=5.0,
    fps=5,
    size="832x480",
    steps=30,
)
Path("flag.mp4").write_bytes(mp4_bytes)

VRAM caveats for video/generation: even Wan 1.3B at fp16 is tight on 8GB cards; 12GB+ recommended for headroom. CogVideoX-2b realistically wants 16GB. LTX-Video needs 16GB+. Mochi-1 (24GB+) and HunyuanVideo (60GB+) are documented but not curated; their dedicated runtimes ship in v1.next.

The OpenAI Python SDK works against muse with no modifications:

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
client.chat.completions.create(model="qwen3-8b-gguf-q4-k-m", messages=[...])

Vision (v0.42.0+):

from openai import OpenAI

client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")

with open("photo.png", "rb") as f:
    import base64
    data_url = "data:image/png;base64," + base64.b64encode(f.read()).decode()

r = client.chat.completions.create(
    model="smolvlm-256m-instruct",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text", "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": data_url}},
        ],
    }],
)
print(r.choices[0].message.content)

muse serve auto-restarts crashed worker processes with exponential backoff. Individual model failures don't take down the server or other modalities.

As of v0.40.0 muse is lazy-load by default. muse serve brings the gateway up instantly with zero workers running. The first request to each model triggers a cold load (worker spawn + weights), so expect 5-30s of latency on that first hit; subsequent requests are hot. Memory pressure is handled by on-demand LRU eviction backed by live pynvml + psutil measurements: a 12GB GPU can have 30 models catalog-enabled and serve them all, just not simultaneously. Operators who want eager-boot semantics put a warmup loop in their startup script:

muse serve &
sleep 1
for m in $(muse models list --json | jq -r '.[].id'); do
    muse models warmup "$m"
done

muse models list shows a five-state status indicator: enabled_loaded (filled circle) for resident workers, enabled_unloaded (half circle) for catalog-enabled-but-unloaded, plus the existing disabled, recommended, and available states. /v1/models gains loaded, last_loaded_at, and unservable_reason per entry. Headroom margins are tunable via MUSE_GPU_HEADROOM_GB (default 1.0) and MUSE_CPU_HEADROOM_GB (default 2.0); declared caps via MUSE_GPU_BUDGET_GB and MUSE_CPU_BUDGET_GB are optional and combined with live measurements as min(declared, live).

CLI (admin-only)

Command	Description
`muse serve`	start the HTTP server (instant boot; lazy-load on first request)
`muse pull <model-id-or-uri>`	download weights + install deps + run probe (accepts bundled id OR resolver URI like `hf://org/repo@variant`; `--no-probe` opts out)
`muse search <query> [--modality M]`	search HuggingFace for pullable GGUF / sentence-transformers models
`muse models list [--modality X]`	list known/pulled models with five-state status (enabled_loaded / enabled_unloaded / disabled / recommended / available)
`muse models info <model-id>`	show catalog entry
`muse models remove <model-id>`	unregister from catalog
`muse models enable <model-id>`	mark a pulled model active in the catalog (allowed to lazy-load)
`muse models disable <model-id>`	mark a pulled model inactive in the catalog (refuses to lazy-load)
`muse models warmup <model-id>`	pre-load a model into a worker without serving traffic; first real request is hot
`muse models refresh <id> \| --all \| --enabled`	re-install museq[server,extras] into per-model venv(s) (after `pip install -U museq`)
`muse mcp [--http]`	run an MCP server bridging muse to LLM clients (29 tools)

No per-modality subcommands (muse speak, muse audio ...). Those would be hardcoded modality-to-verb mappings that grow with every new modality. Keeping the CLI modality-agnostic means embeddings, transcriptions, and video land without CLI churn.

HTTP endpoints

Endpoint	Purpose
`GET /health`	liveness + enabled modalities
`GET /v1/models`	all registered models, aggregated
`POST /v1/audio/speech`	synthesize speech (OpenAI-compatible)
`GET /v1/audio/speech/voices`	list voices for a model
`POST /v1/audio/transcriptions`	transcribe audio to text (OpenAI-compatible)
`POST /v1/audio/translations`	transcribe + translate audio to English (OpenAI-compatible)
`POST /v1/images/generations`	generate images (OpenAI-compatible; supports img2img via `image` + `strength`)
`POST /v1/images/edits`	inpaint masked regions (OpenAI-compatible; multipart with image+mask+prompt)
`POST /v1/images/variations`	generate alternates of one image (OpenAI-compatible; multipart, no prompt)
`POST /v1/embeddings`	text embeddings (OpenAI-compatible)
`POST /v1/images/embeddings`	image embeddings (OpenAI-shape envelope mirroring /v1/embeddings)
`POST /v1/audio/embeddings`	audio embeddings (multipart upload + OpenAI-shape envelope mirroring /v1/embeddings)
`POST /v1/chat/completions`	chat (OpenAI-compatible incl. tools, structured output, streaming)
`POST /v1/moderations`	text moderation/classification (OpenAI-compatible)
`POST /v1/rerank`	text rerank (Cohere-compat)
`POST /v1/summarize`	text summarization (Cohere-compat)
`POST /v1/audio/music`	music generation (capability-gated; muse-native shape)
`POST /v1/audio/sfx`	sound-effect generation (capability-gated; muse-native shape)
`POST /v1/video/generations`	text-to-video generation (mp4/webm/frames_b64; GPU-required)

Error shape is uniform: {"error": {"code", "message", "type"}} across 404 (model not found) and 422 (validation). Matches OpenAI's envelope so clients written against their API work against muse.

Admin endpoints (v0.28.0+)

Eleven endpoints under /v1/admin/* let you enable, disable, probe, pull, and remove models on a running supervisor without restarting it. The admin surface is closed-by-default: set MUSE_ADMIN_TOKEN to any non-empty value to enable it, then send Authorization: Bearer <token> on every request.

Endpoint	Purpose
`POST /v1/admin/models/{id}/enable`	spawn a worker (or restart-in-place) hosting `id`; returns 202 + job_id
`POST /v1/admin/models/{id}/disable`	unload `id` from its worker; sync
`POST /v1/admin/models/{id}/probe`	run `muse models probe` in the model's venv; returns 202 + job_id
`POST /v1/admin/models/_/pull`	pull from a curated alias or resolver URI in the body; returns 202 + job_id
`DELETE /v1/admin/models/{id}?purge=bool`	remove from catalog (refuses 409 if loaded)
`GET /v1/admin/models/{id}/status`	merged catalog + live worker view
`GET /v1/admin/workers`	spawned workers + pid/uptime/restart-count
`POST /v1/admin/workers/{port}/restart`	SIGTERM by port; auto-restart monitor handles bringup
`GET /v1/admin/memory`	per-device aggregate + per-model breakdown
`GET /v1/admin/jobs/{job_id}`	one async-job record (404 if reaped)
`GET /v1/admin/jobs`	recent jobs newest-first

Auth setup:

export MUSE_ADMIN_TOKEN="$(openssl rand -hex 32)"  # or any non-empty value
muse serve  # admin endpoints now active under the same port

Five auth scenarios:

env var unset, any header: 503 admin_disabled
env var set, no header: 401 missing_token
env var set, malformed header: 401 missing_token
env var set, wrong bearer: 403 invalid_token
env var set, correct bearer: route runs

curl examples:

TOKEN="$MUSE_ADMIN_TOKEN"
H="Authorization: Bearer $TOKEN"

# enable a pulled model (worker spawns or joins existing venv-group)
curl -s -X POST -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/enable

# poll the returned job
curl -s -H "$H" http://localhost:8000/v1/admin/jobs/<job_id>

# disable a loaded model (sync)
curl -s -X POST -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/disable

# merged status
curl -s -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/status

# memory aggregate (psutil + pynvml)
curl -s -H "$H" http://localhost:8000/v1/admin/memory

Python (use the AdminClient):

from muse.admin.client import AdminClient

# Reads MUSE_SERVER and MUSE_ADMIN_TOKEN from env when unset.
admin = AdminClient()

job = admin.enable("kokoro-82m")
final = admin.wait(job["job_id"])
print(final["state"], final.get("result"))

print(admin.status("kokoro-82m"))
print(admin.workers())
print(admin.memory())

The muse models enable/disable CLI commands route through this admin API automatically when MUSE_ADMIN_TOKEN is set and the supervisor is reachable, falling back to a catalog-only mutation (effective on next muse serve) otherwise.

MCP server (since v0.29.0)

muse mcp runs a Model Context Protocol server that exposes muse to LLM clients (Claude Desktop, Cursor, etc.) as 29 structured tools: 11 admin tools (gated by MUSE_ADMIN_TOKEN) plus 18 inference tools. Stdio mode is the default (for desktop apps); HTTP+SSE mode (--http --port 8088) is available for remote / web embedders.

muse mcp                                  # stdio mode
muse mcp --http --port 8088               # HTTP+SSE
muse mcp --filter inference               # only inference tools (no admin)
muse mcp --filter admin                   # only admin tools (control panel)
muse mcp --server http://other:8000       # connect to a remote muse server

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):

{
  "mcpServers": {
    "muse": {
      "command": "muse",
      "args": ["mcp"],
      "env": {
        "MUSE_SERVER": "http://localhost:8000",
        "MUSE_ADMIN_TOKEN": "your-admin-token-here"
      }
    }
  }
}

Tools split into two groups:

Admin (11): muse_list_models, muse_get_model_info, muse_search_models, muse_pull_model, muse_remove_model, muse_enable_model, muse_disable_model, muse_probe_model, muse_get_memory_status, muse_get_workers, muse_get_jobs. Long-running ops (pull, probe, enable) return a job_id and the LLM polls muse_get_jobs to track progress.

Inference (18): muse_chat, muse_summarize, muse_rerank, muse_classify, muse_embed_text, muse_generate_image, muse_edit_image, muse_vary_image, muse_upscale_image, muse_segment_image, muse_generate_animation, muse_embed_image, muse_speak, muse_transcribe, muse_generate_music, muse_generate_sfx, muse_embed_audio, muse_generate_video.

Binary inputs accept <name>_b64 (base64), <name>_url (data: or http URL), or <name>_path (local file). Image and audio outputs return as MCP ImageContent / AudioContent blocks plus a JSON summary.

Architecture

muse.core: modality-agnostic discovery, registry, catalog, venv management, HF downloader, pip auto-install, FastAPI app factory.
muse.cli_impl: serve (supervisor), worker (single-venv process), gateway (HTTP proxy routing by request's model field).
muse.modalities/: one subpackage per modality (wire contract: protocol + routes + codec + client).
- audio_embedding/ (MODALITY "audio/embedding"; multipart upload + OpenAI-shape envelope; includes runtimes/transformers_audio.py)
- audio_generation/ (MODALITY "audio/generation"; mounts both /v1/audio/music and /v1/audio/sfx on one MIME tag with per-route capability gates)
- audio_speech/ (MODALITY "audio/speech")
- audio_transcription/ (MODALITY "audio/transcription"; multipart/form-data upload, OpenAI Whisper wire shape)
- chat_completion/ (MODALITY "chat/completion"; includes runtimes/llama_cpp.py)
- embedding_text/ (MODALITY "embedding/text"; includes runtimes/sentence_transformers.py)
- image_embedding/ (MODALITY "image/embedding"; includes runtimes/transformers_image.py)
- image_generation/ (MODALITY "image/generation")
- text_classification/ (MODALITY "text/classification"; OpenAI /v1/moderations wire shape)
- text_rerank/ (MODALITY "text/rerank"; Cohere /v1/rerank wire shape)
- text_summarization/ (MODALITY "text/summarization"; Cohere /v1/summarize wire shape)
- video_generation/ (MODALITY "video/generation"; includes runtimes/wan_runtime.py and runtimes/cogvideox_runtime.py)
muse.models/: flat directory of drop-in model scripts, one file per model (MANIFEST + Model class).
- soprano_80m.py, kokoro_82m.py, bark_small.py (audio/speech)
- nv_embed_v2.py (embedding/text; MiniLM and Qwen3-Embedding are now resolver-pulled via the generic runtime, see curated.yaml)
- sd_turbo.py (image/generation)
- bge_reranker_v2_m3.py (text/rerank)
- stable_audio_open_1_0.py (audio/generation; Stable Audio Open 1.0, Apache 2.0)
- bart_large_cnn.py (text/summarization; facebook/bart-large-cnn, Apache 2.0, ~400MB CPU-friendly)
- dinov2_small.py (image/embedding; facebook/dinov2-small, Apache 2.0, 88MB, 384-dim CPU-friendly)
- mert_v1_95m.py (audio/embedding; m-a-p/MERT-v1-95M, MIT, 95MB, 768-dim music understanding via mean-pool over time)
- wan2_1_t2v_1_3b.py (video/generation; Wan-AI/Wan2.1-T2V-1.3B, Apache 2.0, ~3GB at fp16, 5s clips at 832x480, GPU-required)
muse.core.resolvers: URI -> ResolvedModel dispatch for muse pull hf://....
- resolvers_hf registers the hf:// resolver for HuggingFace GGUF + sentence-transformers repos.

muse serve is a supervisor process. It spawns one worker subprocess per venv (each pulled model has its own venv with its own deps) and runs a gateway that proxies by the model field. Dep conflicts between models are structurally impossible.

Three ways to extend muse:

Resolver URI: muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_m for any GGUF or sentence-transformers HF repo. See docs/RESOLVERS.md.
Model script: drop a .py into ~/.muse/models/ for one-off models with custom code. See docs/MODEL_SCRIPTS.md.
Modality subpackage: drop into src/muse/modalities/ or $MUSE_MODALITIES_DIR for a whole new modality.

See CLAUDE.md for implementation details and contribution guide, docs/MODEL_SCRIPTS.md for writing your own model scripts, docs/RESOLVERS.md for adding a new URI scheme, and docs/CHAT_COMPLETION.md for the chat endpoint specification.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.45.3

May 12, 2026

0.45.2

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

museq-0.45.3.tar.gz (397.5 kB view details)

Uploaded May 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

museq-0.45.3-py3-none-any.whl (505.1 kB view details)

Uploaded May 12, 2026 Python 3

File details

Details for the file museq-0.45.3.tar.gz.

File metadata

Download URL: museq-0.45.3.tar.gz
Upload date: May 12, 2026
Size: 397.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for museq-0.45.3.tar.gz
Algorithm	Hash digest
SHA256	`3bb87a176151e6cef7dea4016a15db4778b22cf397e40ff65eadd37810a5db00`
MD5	`2ff40557d08950d5442a9fe8dd110e51`
BLAKE2b-256	`ff9d34b3fab5b5a15e58e59e4f54f8dd4323739ea822fa3670b0e82f8ad2e174`

See more details on using hashes here.

File details

Details for the file museq-0.45.3-py3-none-any.whl.

File metadata

Download URL: museq-0.45.3-py3-none-any.whl
Upload date: May 12, 2026
Size: 505.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for museq-0.45.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`44e589f5ade8118eb2d2bd4ef97e4c6d2836c79248e3947674560e149f414b3c`
MD5	`a3f084d661747aeb3d9dfbf6afb99e5c`
BLAKE2b-256	`38d667b5e38ed80122a0ea342ad516aec891b10e8d4353573162f0d8540ed59a`

See more details on using hashes here.

museq 0.45.3

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

Muse

Install

Quick start

CLI (admin-only)

HTTP endpoints

Admin endpoints (v0.28.0+)

MCP server (since v0.29.0)

Architecture

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes