Self-hostable OpenAI-compatible multi-modality AI server: 20 modalities (chat, image, audio, video, 3D, embeddings, OCR, segmentation), plugin runtimes (PyTorch, diffusers, llama-cpp-python).
Project description
Muse
Model-agnostic multi-modality generation server. OpenAI-compatible HTTP is the canonical interface:
- text-to-speech on
/v1/audio/speech - speech-to-text on
/v1/audio/transcriptionsand/v1/audio/translations - text-to-music on
/v1/audio/musicand text-to-sound-effects on/v1/audio/sfx - text-to-image on
/v1/images/generations, image inpainting on/v1/images/edits, image variations on/v1/images/variations - image-to-image super-resolution on
/v1/images/upscale - promptable segmentation on
/v1/images/segment - text-to-animation on
/v1/images/animations - text-to-video on
/v1/video/generations - image-to-vector on
/v1/images/embeddings - audio-to-vector on
/v1/audio/embeddings - text-to-vector on
/v1/embeddings - text-to-text (LLM, tool calls, streaming) on
/v1/chat/completions - text moderation/classification on
/v1/moderations - text rerank (Cohere-compat) on
/v1/rerank - text summarization (Cohere-compat) on
/v1/summarize
Modality tags are MIME-style (audio/embedding, audio/generation, audio/speech, audio/transcription, chat/completion, embedding/text, image/animation, image/embedding, image/generation, image/segmentation, image/upscale, text/classification, text/rerank, text/summarization, video/generation).
Three ways to add a model, in order of how often you'll reach for them:
- Pull a GGUF or sentence-transformers model from HuggingFace by URI. No script, no edits:
muse search qwen3 --modality chat/completion --max-size-gb 10 muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_m
- Drop a
.pyscript into~/.muse/models/for a one-off model with custom code (seedocs/MODEL_SCRIPTS.md). - Add a whole new modality (rare) by dropping a subpackage into
src/muse/modalities/or$MUSE_MODALITIES_DIR. The subpackage exportsMODALITY+build_routerand discovery picks it up. Optional: drop ahf.pynext to__init__.pyexporting anHF_PLUGINdict; muse's HF resolver picks it up the same way andmuse search/muse pull hf://...work for the new modality.
All three surfaces are discovered at runtime; there is no hardcoded catalog, no allowlist, and no registration calls.
The CLI is deliberately admin-only (serve, pull, search, models). Generation is reached via the HTTP API, consumed by Python clients, curl, or future wrappers like muse mcp.
Install
pip install -e ".[server,audio,images]"
Optional extras:
audio: PyTorch + transformers for TTS backendsaudio-kokoro: Kokoro TTS (needs systemespeak-ng)images: diffusers + Pillow for SD-Turbo and future image backendsserver: FastAPI + uvicorn + sse-starlette (only needed on the serving host)dev: pytest + coverage tools
Quick start
# Pull bundled models by id (creates a dedicated venv + installs deps + downloads weights)
muse pull soprano-80m
muse pull sd-turbo
# Or pull anything resolvable from HuggingFace by URI
muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_m
muse pull hf://sentence-transformers/all-MiniLM-L6-v2
# Admin: list what's in the catalog
muse models list
# Start the server (instant boot; serves OpenAI-compatible endpoints).
# As of v0.40.0 muse is lazy-load: enabled models stay on disk until
# the first request that names them, then spawn a worker on demand.
muse serve --host 0.0.0.0 --port 8000
# Optional: pre-warm a model so the first real request is hot
muse models warmup soprano-80m
From any client, generation is an HTTP call:
# Text-to-speech
curl -X POST http://localhost:8000/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{"input":"Hello world","model":"soprano-80m"}' \
--output hello.wav
# Embeddings (accepts single string or list)
curl -X POST http://localhost:8000/v1/embeddings \
-H "Content-Type: application/json" \
-d '{"input":"hello world","model":"all-minilm-l6-v2"}'
# Image embeddings (input is data: URL or http(s):// URL; mirrors /v1/embeddings)
IMG_B64=$(base64 -w0 cat.png)
curl -X POST http://localhost:8000/v1/images/embeddings \
-H "Content-Type: application/json" \
-d "{\"input\":\"data:image/png;base64,${IMG_B64}\",\"model\":\"dinov2-small\"}"
# Audio embeddings (multipart upload; one or more `file` parts; mirrors /v1/embeddings envelope)
curl -X POST http://localhost:8000/v1/audio/embeddings \
-F "file=@clip.wav" \
-F "model=mert-v1-95m"
# Chat (OpenAI-compatible incl. tools and streaming)
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{"model":"qwen3-8b-gguf-q4-k-m","messages":[{"role":"user","content":"Capital of France?"}]}'
# Rerank (Cohere-compat); pulls bge-reranker-v2-m3 by default
curl -X POST http://localhost:8000/v1/rerank \
-H "Content-Type: application/json" \
-d '{
"query": "what is muse?",
"documents": [
"muse is an audio server",
"muse is a multi-modality generation server",
"muse is the goddess of inspiration"
],
"model": "bge-reranker-v2-m3",
"top_n": 2,
"return_documents": true
}'
# Summarize (Cohere-compat); pulls bart-large-cnn by default
curl -X POST http://localhost:8000/v1/summarize \
-H "Content-Type: application/json" \
-d '{
"text": "muse is a model-agnostic multi-modality generation server. It hosts text, image, audio, and video models behind a unified HTTP API that mirrors OpenAI where possible.",
"length": "short",
"format": "paragraph",
"model": "bart-large-cnn"
}'
# Music generation (capability-gated; default model: stable-audio-open-1.0)
curl -X POST http://localhost:8000/v1/audio/music \
-H "Content-Type: application/json" \
-d '{"prompt":"ambient piano with light rain","model":"stable-audio-open-1.0","duration":10.0}' \
--output music.wav
# Sound effects generation (same model, different intent)
curl -X POST http://localhost:8000/v1/audio/sfx \
-H "Content-Type: application/json" \
-d '{"prompt":"footsteps on gravel","model":"stable-audio-open-1.0","duration":3.0}' \
--output footsteps.wav
# Image inpainting (multipart: image + mask + prompt)
# White mask pixels are regenerated; black pixels are kept.
curl -X POST http://localhost:8000/v1/images/edits \
-F "image=@scene.png" \
-F "mask=@mask.png" \
-F "prompt=add a moon to the sky" \
-F "model=sd-turbo" \
-F "size=512x512" \
-F "n=1"
# Image variations (multipart: image only, no prompt)
curl -X POST http://localhost:8000/v1/images/variations \
-F "image=@scene.png" \
-F "model=sd-turbo" \
-F "size=512x512" \
-F "n=2"
# Image upscale (multipart: 4x super-resolution; SD x4 supports scale=4 only)
curl -s -X POST http://localhost:8000/v1/images/upscale \
-F "image=@source.png" \
-F "model=stable-diffusion-x4-upscaler" \
-F "scale=4" \
-F "prompt=high detail" \
| jq -r '.data[0].b64_json' \
| base64 -d > upscaled.png
# Image segmentation (multipart: SAM-2 promptable masks)
# Mode 1: automatic (sweep grid of point prompts internally)
curl -s -X POST http://localhost:8000/v1/images/segment \
-F "image=@scene.png" \
-F "model=sam2-hiera-tiny" \
-F "mode=auto" \
-F "max_masks=8"
# Mode 2: foreground click points
curl -s -X POST http://localhost:8000/v1/images/segment \
-F "image=@scene.png" \
-F "model=sam2-hiera-tiny" \
-F "mode=points" \
-F 'points=[[150, 200]]'
# Mode 3: bounding boxes
curl -s -X POST http://localhost:8000/v1/images/segment \
-F "image=@scene.png" \
-F "model=sam2-hiera-tiny" \
-F "mode=boxes" \
-F 'boxes=[[50, 60, 250, 240]]' \
-F "mask_format=rle"
# Video generation (since v0.27.0; GPU-required, 8GB+ VRAM tight)
# Default response_format=mp4; "webm" and "frames_b64" also supported.
curl -s -X POST http://localhost:8000/v1/video/generations \
-H "Content-Type: application/json" \
-d '{
"prompt": "a flag waving in the wind",
"model": "wan2-1-t2v-1-3b",
"duration_seconds": 5.0,
"fps": 5,
"size": "832x480",
"steps": 30
}' \
| jq -r '.data[0].b64_json' \
| base64 -d > flag.mp4
from muse.modalities.audio_speech import SpeechClient
from muse.modalities.image_generation import (
GenerationsClient, ImageEditsClient, ImageVariationsClient,
)
from muse.modalities.embedding_text import EmbeddingsClient
from muse.modalities.chat_completion import ChatClient
# MUSE_SERVER env var sets the base URL for remote use; default http://localhost:8000
wav_bytes = SpeechClient().infer("Hello world")
pngs = GenerationsClient().generate("a cat on mars, cinematic", n=1)
vectors = EmbeddingsClient().embed(["alpha", "beta"]) # list[list[float]]
chat = ChatClient().chat(
model="qwen3-8b-gguf-q4-k-m",
messages=[{"role": "user", "content": "Capital of France?"}],
)
# Image inpainting and variations (since v0.21.0)
src = open("scene.png", "rb").read()
msk = open("mask.png", "rb").read()
edited = ImageEditsClient().edit(
"add a moon to the sky", image=src, mask=msk, model="sd-turbo",
)
variants = ImageVariationsClient().vary(image=src, model="sd-turbo", n=2)
# Image upscale (since v0.25.0): 4x super-resolution
from muse.modalities.image_upscale import ImageUpscaleClient
from pathlib import Path
upscaled = ImageUpscaleClient().upscale(
image=Path("source.png").read_bytes(),
model="stable-diffusion-x4-upscaler",
scale=4,
prompt="razor sharp detail",
)
Path("upscaled.png").write_bytes(upscaled[0])
# Image segmentation (since v0.26.0): SAM-2 promptable masks
from muse.modalities.image_segmentation import ImageSegmentationClient
seg = ImageSegmentationClient()
src_bytes = Path("scene.png").read_bytes()
result_auto = seg.segment(
image=src_bytes, model="sam2-hiera-tiny", mode="auto", max_masks=8,
)
result_points = seg.segment(
image=src_bytes, model="sam2-hiera-tiny", mode="points",
points=[[150, 200]],
)
result_boxes = seg.segment(
image=src_bytes, model="sam2-hiera-tiny", mode="boxes",
boxes=[[50, 60, 250, 240]], mask_format="rle",
)
# Each result is a dict {id, model, mode, image_size, masks: [...]}
# masks[i]["mask"] is a base64 PNG (mask_format=png_b64) or
# a {"size": [H, W], "counts": str} dict (mask_format=rle)
# Video generation (since v0.27.0): GPU-required, 8GB+ VRAM tight
# Wan2.1 T2V 1.3B (~3GB at fp16) is the default low-VRAM bundle;
# CogVideoX-2b (~9GB) and LTX-Video (~16GB) are curated additions.
from muse.modalities.video_generation import VideoGenerationClient
vid = VideoGenerationClient()
mp4_bytes = vid.generate(
"a flag waving in the wind",
model="wan2-1-t2v-1-3b",
duration_seconds=5.0,
fps=5,
size="832x480",
steps=30,
)
Path("flag.mp4").write_bytes(mp4_bytes)
VRAM caveats for video/generation: even Wan 1.3B at fp16 is tight on 8GB cards; 12GB+ recommended for headroom. CogVideoX-2b realistically wants 16GB. LTX-Video needs 16GB+. Mochi-1 (24GB+) and HunyuanVideo (60GB+) are documented but not curated; their dedicated runtimes ship in v1.next.
The OpenAI Python SDK works against muse with no modifications:
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
client.chat.completions.create(model="qwen3-8b-gguf-q4-k-m", messages=[...])
Vision (v0.42.0+):
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8000/v1", api_key="not-used")
with open("photo.png", "rb") as f:
import base64
data_url = "data:image/png;base64," + base64.b64encode(f.read()).decode()
r = client.chat.completions.create(
model="smolvlm-256m-instruct",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": data_url}},
],
}],
)
print(r.choices[0].message.content)
muse serve auto-restarts crashed worker processes with exponential backoff.
Individual model failures don't take down the server or other modalities.
As of v0.40.0 muse is lazy-load by default. muse serve brings
the gateway up instantly with zero workers running. The first request
to each model triggers a cold load (worker spawn + weights), so
expect 5-30s of latency on that first hit; subsequent requests are
hot. Memory pressure is handled by on-demand LRU eviction backed by
live pynvml + psutil measurements: a 12GB GPU can have 30 models
catalog-enabled and serve them all, just not simultaneously. Operators
who want eager-boot semantics put a warmup loop in their startup
script:
muse serve &
sleep 1
for m in $(muse models list --json | jq -r '.[].id'); do
muse models warmup "$m"
done
muse models list shows a five-state status indicator: enabled_loaded
(filled circle) for resident workers, enabled_unloaded (half circle)
for catalog-enabled-but-unloaded, plus the existing disabled,
recommended, and available states. /v1/models gains loaded,
last_loaded_at, and unservable_reason per entry. Headroom margins
are tunable via MUSE_GPU_HEADROOM_GB (default 1.0) and
MUSE_CPU_HEADROOM_GB (default 2.0); declared caps via
MUSE_GPU_BUDGET_GB and MUSE_CPU_BUDGET_GB are optional and
combined with live measurements as min(declared, live).
CLI (admin-only)
| Command | Description |
|---|---|
muse serve |
start the HTTP server (instant boot; lazy-load on first request) |
muse pull <model-id-or-uri> |
download weights + install deps + run probe (accepts bundled id OR resolver URI like hf://org/repo@variant; --no-probe opts out) |
muse search <query> [--modality M] |
search HuggingFace for pullable GGUF / sentence-transformers models |
muse models list [--modality X] |
list known/pulled models with five-state status (enabled_loaded / enabled_unloaded / disabled / recommended / available) |
muse models info <model-id> |
show catalog entry |
muse models remove <model-id> |
unregister from catalog |
muse models enable <model-id> |
mark a pulled model active in the catalog (allowed to lazy-load) |
muse models disable <model-id> |
mark a pulled model inactive in the catalog (refuses to lazy-load) |
muse models warmup <model-id> |
pre-load a model into a worker without serving traffic; first real request is hot |
muse models refresh <id> | --all | --enabled |
re-install museq[server,extras] into per-model venv(s) (after pip install -U museq) |
muse mcp [--http] |
run an MCP server bridging muse to LLM clients (29 tools) |
No per-modality subcommands (muse speak, muse audio ...). Those would be hardcoded modality-to-verb mappings that grow with every new modality. Keeping the CLI modality-agnostic means embeddings, transcriptions, and video land without CLI churn.
HTTP endpoints
| Endpoint | Purpose |
|---|---|
GET /health |
liveness + enabled modalities |
GET /v1/models |
all registered models, aggregated |
POST /v1/audio/speech |
synthesize speech (OpenAI-compatible) |
GET /v1/audio/speech/voices |
list voices for a model |
POST /v1/audio/transcriptions |
transcribe audio to text (OpenAI-compatible) |
POST /v1/audio/translations |
transcribe + translate audio to English (OpenAI-compatible) |
POST /v1/images/generations |
generate images (OpenAI-compatible; supports img2img via image + strength) |
POST /v1/images/edits |
inpaint masked regions (OpenAI-compatible; multipart with image+mask+prompt) |
POST /v1/images/variations |
generate alternates of one image (OpenAI-compatible; multipart, no prompt) |
POST /v1/embeddings |
text embeddings (OpenAI-compatible) |
POST /v1/images/embeddings |
image embeddings (OpenAI-shape envelope mirroring /v1/embeddings) |
POST /v1/audio/embeddings |
audio embeddings (multipart upload + OpenAI-shape envelope mirroring /v1/embeddings) |
POST /v1/chat/completions |
chat (OpenAI-compatible incl. tools, structured output, streaming) |
POST /v1/moderations |
text moderation/classification (OpenAI-compatible) |
POST /v1/rerank |
text rerank (Cohere-compat) |
POST /v1/summarize |
text summarization (Cohere-compat) |
POST /v1/audio/music |
music generation (capability-gated; muse-native shape) |
POST /v1/audio/sfx |
sound-effect generation (capability-gated; muse-native shape) |
POST /v1/video/generations |
text-to-video generation (mp4/webm/frames_b64; GPU-required) |
Error shape is uniform: {"error": {"code", "message", "type"}} across 404 (model not found) and 422 (validation). Matches OpenAI's envelope so clients written against their API work against muse.
Admin endpoints (v0.28.0+)
Eleven endpoints under /v1/admin/* let you enable, disable, probe, pull, and remove models on a running supervisor without restarting it. The admin surface is closed-by-default: set MUSE_ADMIN_TOKEN to any non-empty value to enable it, then send Authorization: Bearer <token> on every request.
| Endpoint | Purpose |
|---|---|
POST /v1/admin/models/{id}/enable |
spawn a worker (or restart-in-place) hosting id; returns 202 + job_id |
POST /v1/admin/models/{id}/disable |
unload id from its worker; sync |
POST /v1/admin/models/{id}/probe |
run muse models probe in the model's venv; returns 202 + job_id |
POST /v1/admin/models/_/pull |
pull from a curated alias or resolver URI in the body; returns 202 + job_id |
DELETE /v1/admin/models/{id}?purge=bool |
remove from catalog (refuses 409 if loaded) |
GET /v1/admin/models/{id}/status |
merged catalog + live worker view |
GET /v1/admin/workers |
spawned workers + pid/uptime/restart-count |
POST /v1/admin/workers/{port}/restart |
SIGTERM by port; auto-restart monitor handles bringup |
GET /v1/admin/memory |
per-device aggregate + per-model breakdown |
GET /v1/admin/jobs/{job_id} |
one async-job record (404 if reaped) |
GET /v1/admin/jobs |
recent jobs newest-first |
Auth setup:
export MUSE_ADMIN_TOKEN="$(openssl rand -hex 32)" # or any non-empty value
muse serve # admin endpoints now active under the same port
Five auth scenarios:
- env var unset, any header:
503 admin_disabled - env var set, no header:
401 missing_token - env var set, malformed header:
401 missing_token - env var set, wrong bearer:
403 invalid_token - env var set, correct bearer: route runs
curl examples:
TOKEN="$MUSE_ADMIN_TOKEN"
H="Authorization: Bearer $TOKEN"
# enable a pulled model (worker spawns or joins existing venv-group)
curl -s -X POST -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/enable
# poll the returned job
curl -s -H "$H" http://localhost:8000/v1/admin/jobs/<job_id>
# disable a loaded model (sync)
curl -s -X POST -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/disable
# merged status
curl -s -H "$H" http://localhost:8000/v1/admin/models/kokoro-82m/status
# memory aggregate (psutil + pynvml)
curl -s -H "$H" http://localhost:8000/v1/admin/memory
Python (use the AdminClient):
from muse.admin.client import AdminClient
# Reads MUSE_SERVER and MUSE_ADMIN_TOKEN from env when unset.
admin = AdminClient()
job = admin.enable("kokoro-82m")
final = admin.wait(job["job_id"])
print(final["state"], final.get("result"))
print(admin.status("kokoro-82m"))
print(admin.workers())
print(admin.memory())
The muse models enable/disable CLI commands route through this admin API automatically when MUSE_ADMIN_TOKEN is set and the supervisor is reachable, falling back to a catalog-only mutation (effective on next muse serve) otherwise.
MCP server (since v0.29.0)
muse mcp runs a Model Context Protocol server that exposes muse to LLM clients (Claude Desktop, Cursor, etc.) as 29 structured tools: 11 admin tools (gated by MUSE_ADMIN_TOKEN) plus 18 inference tools. Stdio mode is the default (for desktop apps); HTTP+SSE mode (--http --port 8088) is available for remote / web embedders.
muse mcp # stdio mode
muse mcp --http --port 8088 # HTTP+SSE
muse mcp --filter inference # only inference tools (no admin)
muse mcp --filter admin # only admin tools (control panel)
muse mcp --server http://other:8000 # connect to a remote muse server
Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json on macOS, %APPDATA%\Claude\claude_desktop_config.json on Windows):
{
"mcpServers": {
"muse": {
"command": "muse",
"args": ["mcp"],
"env": {
"MUSE_SERVER": "http://localhost:8000",
"MUSE_ADMIN_TOKEN": "your-admin-token-here"
}
}
}
}
Tools split into two groups:
Admin (11): muse_list_models, muse_get_model_info, muse_search_models, muse_pull_model, muse_remove_model, muse_enable_model, muse_disable_model, muse_probe_model, muse_get_memory_status, muse_get_workers, muse_get_jobs. Long-running ops (pull, probe, enable) return a job_id and the LLM polls muse_get_jobs to track progress.
Inference (18): muse_chat, muse_summarize, muse_rerank, muse_classify, muse_embed_text, muse_generate_image, muse_edit_image, muse_vary_image, muse_upscale_image, muse_segment_image, muse_generate_animation, muse_embed_image, muse_speak, muse_transcribe, muse_generate_music, muse_generate_sfx, muse_embed_audio, muse_generate_video.
Binary inputs accept <name>_b64 (base64), <name>_url (data: or http URL), or <name>_path (local file). Image and audio outputs return as MCP ImageContent / AudioContent blocks plus a JSON summary.
Architecture
muse.core: modality-agnostic discovery, registry, catalog, venv management, HF downloader, pip auto-install, FastAPI app factory.muse.cli_impl:serve(supervisor),worker(single-venv process),gateway(HTTP proxy routing by request'smodelfield).muse.modalities/: one subpackage per modality (wire contract: protocol + routes + codec + client).audio_embedding/(MODALITY"audio/embedding"; multipart upload + OpenAI-shape envelope; includesruntimes/transformers_audio.py)audio_generation/(MODALITY"audio/generation"; mounts both/v1/audio/musicand/v1/audio/sfxon one MIME tag with per-route capability gates)audio_speech/(MODALITY"audio/speech")audio_transcription/(MODALITY"audio/transcription"; multipart/form-data upload, OpenAI Whisper wire shape)chat_completion/(MODALITY"chat/completion"; includesruntimes/llama_cpp.py)embedding_text/(MODALITY"embedding/text"; includesruntimes/sentence_transformers.py)image_embedding/(MODALITY"image/embedding"; includesruntimes/transformers_image.py)image_generation/(MODALITY"image/generation")text_classification/(MODALITY"text/classification"; OpenAI/v1/moderationswire shape)text_rerank/(MODALITY"text/rerank"; Cohere/v1/rerankwire shape)text_summarization/(MODALITY"text/summarization"; Cohere/v1/summarizewire shape)video_generation/(MODALITY"video/generation"; includesruntimes/wan_runtime.pyandruntimes/cogvideox_runtime.py)
muse.models/: flat directory of drop-in model scripts, one file per model (MANIFEST + Model class).soprano_80m.py,kokoro_82m.py,bark_small.py(audio/speech)nv_embed_v2.py(embedding/text; MiniLM and Qwen3-Embedding are now resolver-pulled via the generic runtime, seecurated.yaml)sd_turbo.py(image/generation)bge_reranker_v2_m3.py(text/rerank)stable_audio_open_1_0.py(audio/generation; Stable Audio Open 1.0, Apache 2.0)bart_large_cnn.py(text/summarization; facebook/bart-large-cnn, Apache 2.0, ~400MB CPU-friendly)dinov2_small.py(image/embedding; facebook/dinov2-small, Apache 2.0, 88MB, 384-dim CPU-friendly)mert_v1_95m.py(audio/embedding; m-a-p/MERT-v1-95M, MIT, 95MB, 768-dim music understanding via mean-pool over time)wan2_1_t2v_1_3b.py(video/generation; Wan-AI/Wan2.1-T2V-1.3B, Apache 2.0, ~3GB at fp16, 5s clips at 832x480, GPU-required)
muse.core.resolvers: URI -> ResolvedModel dispatch formuse pull hf://....resolvers_hfregisters thehf://resolver for HuggingFace GGUF + sentence-transformers repos.
muse serve is a supervisor process. It spawns one worker subprocess per venv (each pulled model has its own venv with its own deps) and runs a gateway that proxies by the model field. Dep conflicts between models are structurally impossible.
Three ways to extend muse:
- Resolver URI:
muse pull hf://Qwen/Qwen3-8B-GGUF@q4_k_mfor any GGUF or sentence-transformers HF repo. Seedocs/RESOLVERS.md. - Model script: drop a
.pyinto~/.muse/models/for one-off models with custom code. Seedocs/MODEL_SCRIPTS.md. - Modality subpackage: drop into
src/muse/modalities/or$MUSE_MODALITIES_DIRfor a whole new modality.
See CLAUDE.md for implementation details and contribution guide,
docs/MODEL_SCRIPTS.md for writing your own model scripts,
docs/RESOLVERS.md for adding a new URI scheme, and
docs/CHAT_COMPLETION.md for the chat endpoint specification.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file museq-0.45.3.tar.gz.
File metadata
- Download URL: museq-0.45.3.tar.gz
- Upload date:
- Size: 397.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3bb87a176151e6cef7dea4016a15db4778b22cf397e40ff65eadd37810a5db00
|
|
| MD5 |
2ff40557d08950d5442a9fe8dd110e51
|
|
| BLAKE2b-256 |
ff9d34b3fab5b5a15e58e59e4f54f8dd4323739ea822fa3670b0e82f8ad2e174
|
File details
Details for the file museq-0.45.3-py3-none-any.whl.
File metadata
- Download URL: museq-0.45.3-py3-none-any.whl
- Upload date:
- Size: 505.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44e589f5ade8118eb2d2bd4ef97e4c6d2836c79248e3947674560e149f414b3c
|
|
| MD5 |
a3f084d661747aeb3d9dfbf6afb99e5c
|
|
| BLAKE2b-256 |
38d667b5e38ed80122a0ea342ad516aec891b10e8d4353573162f0d8540ed59a
|