Skip to main content

Web Over Ollama (and Llamas) โ€” an MCP + OpenAI router for AI desktops.

Project description

woollama

Web Over Ollama (and Llamas). An MCP + OpenAI router for AI desktops.

๐Ÿ“– Documentation: woollama.readthedocs.io

woollama sits between AI clients (Cursor, the OpenAI SDK, Claude Desktop, cosmic-fabric, anything that speaks OpenAI or MCP) and AI backends (Ollama, Anthropic, fabric, lackpy, filesystem MCPs, anything that speaks OpenAI or MCP). It composes them into orchestrated calls without inventing a new protocol.

                          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                          โ”‚   AI clients        โ”‚
                          โ”‚   (any OpenAI or    โ”‚
                          โ”‚    MCP client)      โ”‚
                          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚            woollama                  โ”‚
                  โ”‚  OpenAI server  +  MCP server        โ”‚
                  โ”‚  โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€     โ”‚
                  โ”‚  routes models, tools, executors     โ”‚
                  โ”‚  composes patterns + tools + models  โ”‚
                  โ”‚  into named recipes                  โ”‚
                  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                     โ”‚
                  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                  โ”‚                                      โ”‚
              โ”Œโ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”                            โ”Œโ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”
              โ”‚ MCP    โ”‚  tools, prompts, resources โ”‚ OpenAI  โ”‚  inference
              โ”‚ tool   โ”‚                            โ”‚ compat  โ”‚
              โ”‚ serversโ”‚                            โ”‚ backendsโ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                            โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              fabric-mcp, lackpy,                   Ollama, Anthropic,
              filesystem, git, โ€ฆ                    vLLM, llama.cpp, โ€ฆ

Status

The Rust daemon woollamad โ€” a multi-backend router, both surfaces live, published to crates.io + PyPI. woollama works end-to-end as:

  • an OpenAI-compatible server: /v1/chat/completions (pass-through and hidden chat-loop orchestration of recipes, both with stream:true โ†’ OpenAI SSE), /v1/models, /v1/tools, and a stateful surface โ€” /v1/responses + /v1/conversations (OpenAI Responses/Conversations shape; see below);
  • an MCP server to its own clients โ€” over stdio (woollamad mcp) and over Streamable HTTP at /mcp, mounted on the same port as /v1/*. It re-exports every discovered downstream tool (namespaced, with output_schema) plus a chat verb that emits live tool-progress notifications โ€” i.e. it's an MCP aggregator.

It routes inference across multiple backends by <provider>/<model> โ€” ollama (local), anthropic, openai, groq, together, openrouter, and any OpenAI-compatible endpoint you add in inferencers.toml (e.g. self-hosted vLLM) โ€” plus claude-code/<model>, a keyless path to Claude via the local CLI (tool-less, or as an executor that runs a recipe's allow-listed MCP tools itself โ€” tool delegation). Config is file-driven (mcp.json, recipes.toml, inferencers.toml).

Stateful conversations route handles; backends own the state โ€” woollama never stores transcripts in its own system. Two state-owning backends: claude-resume (claude --resume, for claude-code models; keyless, the Claude session owns the bytes) and managed-agents (Anthropic's Managed Agents, for claude-agent models; ANTHROPIC_API_KEY, Anthropic hosts the session โ€” and exposes the transcript, so /v1/conversations/{id}/items works). Models with no state-owning backend (ollama/cloud/recipe) are stateless โ€” the caller owns history (store:false). Long-lived MCP connections. Served on both a Unix socket ($XDG_RUNTIME_DIR/woollama.sock, mode 0600 โ€” the default for local MCP clients) and an ephemeral loopback TCP port; never 0.0.0.0 without explicit opt-in.

Current status and what's next live in docs/roadmap.md.

The Rust port is done (v0.5.x). woollamad is the canonical router, published to crates.io (cargo install woollama-server) and PyPI (pip install woollama). The Python in src/woollama/ is kept as the reference server and differential-test oracle โ€” not deleted. See docs/rust-transition.md for the (completed) transition criteria.

See docs/architecture.md for the full target design and docs/build-log.md for the slice-by-slice history.

Quick taste

The router is OpenAI-compatible, so any OpenAI client can drive it:

import openai
c = openai.OpenAI(base_url="http://127.0.0.1:<port>/v1", api_key="x")

# Pass-through to Ollama
r = c.chat.completions.create(
    model="ollama/qwen3:14b-iq4xs",
    messages=[{"role": "user", "content": "Hi"}],
)

# Orchestrated: a recipe (system prompt + tools + model), transparent to the
# client. The chat-loop happens inside woollama; client sees only the final answer.
r = c.chat.completions.create(
    model="woollama/streamer",
    messages=[{"role": "user", "content": "Please count to 4."}],
)

woollama serves on two transports at once: a Unix socket at $XDG_RUNTIME_DIR/woollama.sock (mode 0600 โ€” the default for local MCP clients, since a connectable socket can spend the router's API keys) and an ephemeral loopback TCP port written to $XDG_RUNTIME_DIR/woollama.addr for clients to discover. The <port> above is that ephemeral port. Same pattern as a local fabric --serve instance.

Install

The router is woollamad โ€” a small Rust daemon. The Python implementation is kept as a reference server and the differential-test oracle (see below), but woollamad is the canonical router.

From crates.io (once published โ€” cargo install ships only the binary, so bring your own mcp.json):

cargo install woollama-server     # installs the `woollamad` binary
woollamad                         # starts the router; prints its address

From this checkout (works today; includes the bundled example MCP servers):

git clone https://github.com/teaguesterling/woollama
cd woollama
cargo build --release             # builds target/release/woollamad
./target/release/woollamad        # starts the router; prints its address

On startup woollamad prints its OpenAI base_url (e.g. http://127.0.0.1:<port>/v1) โ€” copy that into your OpenAI client. (It's also written to $XDG_RUNTIME_DIR/woollama.addr for programmatic discovery, and it serves the same surface over the woollama.sock unix socket.)

The Python reference server

The original Python implementation still runs and is used as the live oracle that keeps woollamad honest:

uv sync                           # creates .venv and installs deps
uv run woollama                   # the Python reference server

Prerequisite for the examples below: they use ollama/qwen3:14b-iq4xs, so install Ollama, ollama serve, and ollama pull qwen3:14b-iq4xs. No Ollama? Use the keyless Claude path instead โ€” model="claude-code/haiku" (needs the claude CLI logged in) โ€” or any cloud model with its key set (see Configuration).

Tests & lint

# Rust (woollamad): the daemon's own suites
cargo test --tests --features test-fixtures
cargo build --release            # so the live oracle can spawn the binary

# Python: hermetic suite + lint
uv run --extra dev pytest        # hermetic suite (live tests are opt-in: -m integration)
uv run ruff check .              # lint โ€” the CI gate

# The live differential oracle โ€” same tests, against woollamad by default:
uv run --extra dev pytest -m integration            # targets target/release/woollamad
WOOLLAMA_TEST_CMD="python -m woollama" \
  uv run --extra dev pytest -m integration          # opt in to the Python reference

CI (.github/workflows/ci.yml) runs the Rust + Python gates on every push to main and PR. For the same lint gate locally on commit, opt into the pre-commit hook:

uv tool install pre-commit && pre-commit install

Lint only โ€” the project does not use ruff format (lines are hand-wrapped, E501 is ignored), so there is no formatter step in either gate.

Design principles

  1. Two standards, neither extended. MCP for tool/prompt/resource discovery and execution; OpenAI chat-completions for the inference primitive. woollama is a router between them.
  2. Local-only, ephemeral by default. Random loopback port, persisted address file for discovery, never 0.0.0.0 without explicit opt-in. The router holds API keys and routes to local resources โ€” it should not be LAN-reachable.
  3. The model namespace is the universal addressing scheme. Raw inferencers (<provider>/<model>, e.g. ollama/X, anthropic/X, claude-code/X) and full recipes (woollama/<recipe>) are all addressable through OpenAI's standard model field. No new wire format.
  4. woollama owns routing, not inference or tools. It uses other people's inference engines (Ollama, Anthropic, โ€ฆ) and other people's tool servers (any MCP server โ€” filesystem, git, lackpy, โ€ฆ). It composes them.
  5. she talks to llamas.

What works today

  • OpenAI surface: /v1/models, /v1/chat/completions (pass-through + recipe orchestration, both with stream:true โ†’ OpenAI SSE), /v1/tools introspection
  • Stateful surface: /v1/responses (stateless subset, incl. stream:true โ†’ OpenAI Responses SSE, + stateful) and /v1/conversations (create/list/get/ delete, plus items where the backend exposes its transcript). woollama routes conversation handles; backends own state (woollama never stores transcripts itself) โ€” claude-resume for claude-code models, managed-agents (Anthropic Managed Agents) for claude-agent models, with an interactive requires_action pause/answer path; models with no state-owning backend are stateless (store:false)
  • Multi-backend routing by <provider>/<model>: ollama (incl. num_ctx honored via ollama's native /api/chat), anthropic, openai, groq, together, openrouter, claude-code, + any OpenAI-compatible endpoint via inferencers.toml
  • Tool delegation: a claude-code recipe with tools runs as an executor โ€” Claude owns the agentic loop and calls the recipe's allow-listed MCP tools itself (per-recipe --mcp-config + --allowedTools containment)
  • MCP server side: stdio (woollamad mcp) and Streamable HTTP at /mcp on the same port โ€” recipes as parameterized prompts (their {{var}} tokens โ†’ arguments), a chat verb (with live tool-progress notifications), and every downstream tool re-exported with its output_schema (aggregator)
  • Pattern templating on woollama's own /w1/ namespace (not OpenAI's /v1/): parameterized recipes/patterns with {{var}} substitution โ€” GET /w1/patterns (discovery), POST /w1/patterns/{name}/render (assemble), POST /w1/patterns/{name}/run (render + infer, streaming). Patterns also come from a fabric-style directory scan ([patterns]) and a fabric backend: woollama can run/own fabric --serve, surface its library on /w1/, and transparently proxy fabric's API at /fabric/*. Pattern backends are pluggable (the PatternBackend trait โ€” see docs/extending.md)
  • File-driven config (mcp.json, recipes.toml, inferencers.toml), multi- MCP-server discovery + unified tool registry, long-lived MCP connections
  • Recipe allow-list enforced as a security boundary (in-loop AND in delegation); served on a Unix socket + loopback TCP, address discovery file; CI (ruff + hermetic suite, 3.11/3.12)

Not yet (next on the roadmap)

  • The live, interactive Claude-in-tmux session backend (a separate Rust session driver) โ€” gated on spikes that need a real terminal. (The interactive requires_action path itself already works via the managed-agents backend.)
  • cosmic-fabric actually consuming the conversations surface (the last open integration milestone). The generic store-backed mechanism + two reference store providers (MCP + REST) already ship; what's pending is the cross-repo wiring. (Pattern templating + the fabric backend it needed have shipped โ€” see docs/patterns.md.)
  • lackpy re-pinning to the now-published woollama-core wheel.

Full scorecard, ordering, and pending verifications: docs/roadmap.md.

Origin

woollama is the production-grade rewrite of an architecture co-designed in cosmic-fabric, which remains a frontend (and will use woollama as its router engine). The design docs that brought woollama here:

  • docs/architecture.md โ€” the model/tool/executor router design
  • docs/naming.md โ€” how we landed on this name

License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

woollama-0.6.0.tar.gz (267.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

woollama-0.6.0-py3-none-any.whl (69.8 kB view details)

Uploaded Python 3

File details

Details for the file woollama-0.6.0.tar.gz.

File metadata

  • Download URL: woollama-0.6.0.tar.gz
  • Upload date:
  • Size: 267.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for woollama-0.6.0.tar.gz
Algorithm Hash digest
SHA256 5bad3cab0c632aed102b6f593b61204e8ce942fc6fcd3aa9b147d21221d4553c
MD5 77e62e5604b89c3c431bf2bfba023073
BLAKE2b-256 19e5666e3a91acb3a240992c959726f3825cbdbd36da262b065f23aa0f8765e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for woollama-0.6.0.tar.gz:

Publisher: ci.yml on teaguesterling/woollama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file woollama-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: woollama-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 69.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for woollama-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e69b30954b1f32bb1567a1c209200bcfceffd7caee7c86651afbec2d320a15f5
MD5 fe97d80f42239043105fc36bd5e0696e
BLAKE2b-256 eff038234f0af28281b2f0545a52bf9cfa95b1feaff9632a7b7d2490040cd22d

See more details on using hashes here.

Provenance

The following attestation bundles were made for woollama-0.6.0-py3-none-any.whl:

Publisher: ci.yml on teaguesterling/woollama

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page