Skip to main content

model-compose: Declarative AI Workflow Orchestrator

Project description


model-compose

Compose AI Systems, Deploy Anywhere.

Define workflows, agents, models, and comprehensive AI services in YAML. Run them locally, scale them in production, and deploy across any environment without rewriting your stack.

Inspired by docker-compose — one YAML file defines your entire AI system.


Highlights

  • Any model, anywhere — run models locally via HuggingFace, vLLM, and llama.cpp, or connect to OpenAI, Anthropic, Google, and more
  • 20+ components ready — models, agents, HTTP clients, vector/graph stores, shell commands, and more
  • Built-in data stores — Chroma, FAISS, Milvus, Qdrant, Neo4j, ArangoDB, Redis
  • Deploy as container — Docker, native containers, or standalone process with one config
  • Serve any protocol — HTTP REST, WebSocket, or MCP with one line change
  • Distributed execution — scale across machines with Redis-backed queue dispatch
  • Instant Web UI — add a Gradio-powered interface with 2 lines of YAML

Installation

pip install model-compose

Or install from source:

git clone https://github.com/hanyeol/model-compose.git
cd model-compose
pip install -e .

Requires: Python 3.10 or higher


Quick Start

Define your AI runtime in a model-compose.yml:

controller:
  adapter:
    type: http-server
    port: 8080
  webui:
    port: 8081

workflows:
  - id: chat
    default: true
    jobs:
      - component: chatgpt

components:
  - id: chatgpt
    type: http-client
    base_url: https://api.openai.com/v1
    path: /chat/completions
    method: POST
    headers:
      Authorization: Bearer ${env.OPENAI_API_KEY}
    body:
      model: gpt-4o
      messages:
        - role: user
          content: ${input.prompt}

Create a .env file:

OPENAI_API_KEY=your-key

Run it:

model-compose up

Your AI runtime is now serving at http://localhost:8080 with Web UI at http://localhost:8081.

Explore examples for more workflows or read the Documentation.


Core Capabilities

Declarative YAML Configuration

Define your entire AI system in a single YAML file. Workflows, agents, models, APIs, vector/graph stores, and runtimes — all composed and deployed together without custom code.

controller:
  adapter:
    type: http-server
    port: 8080

workflows:
  - id: chat
    default: true
    jobs:
      - component: chatgpt

components:
  - id: chatgpt
    type: http-client
    base_url: https://api.openai.com/v1
    action:
      path: /chat/completions
      method: POST

Flexible Component System

20+ reusable component types. Mix HTTP clients, local models, vector stores, shell commands, and workflows in any combination. Define once, use everywhere.

components:
  - id: chatgpt
    type: http-client

  - id: local-llm
    type: model

  - id: assistant
    type: agent

  - id: knowledge
    type: vector-store

  - id: cache
    type: key-value-store

  - id: runner
    type: shell

Advanced Workflow Composition

Chain jobs with conditional logic, parallel execution, and data transformation. Pass data between jobs with variable binding — ${input}, ${response}, ${env} — with type conversion and defaults.

workflows:
  - id: rag-pipeline
    jobs:
      - id: embed
        component: embedder
        input:
          text: ${input.query}

      - id: search
        component: vector-store
        action: search
        input:
          vector: ${jobs.embed.output}
        depends_on: [embed]

      - id: answer
        component: chatgpt
        input:
          context: ${jobs.search.output}
          question: ${input.query}
        depends_on: [search]

AI Agent Components

Build autonomous AI agents that use workflows as tools. Agents reason, plan, and execute multi-step tasks by dynamically invoking other workflows — all defined declaratively in YAML.

components:
  - id: research-agent
    type: agent
    tools:
      - search-web
      - fetch-page
    max_iteration_count: 10
    action:
      model:
        component: chatgpt
        input:
          messages: ${messages}
          tools: ${tools}
      system_prompt: You are a web research assistant.
      user_prompt: ${input.question}

Human-in-the-Loop

Add approval gates and user input steps to any workflow. Workflows pause, prompt for human input via CLI, Web UI, or API, and resume seamlessly.

workflows:
  - id: write-with-approval
    jobs:
      - id: write-file
        component: file-writer
        input:
          path: ${input.path}
          content: ${input.content}
        interrupt:
          before:
            message: "Approve file write to ${job.input.path}?"

Local Model Execution

Run models from HuggingFace and other sources locally with native support for transformers, vLLM, and PyTorch. Fine-tune models with LoRA/PEFT through YAML configuration.

components:
  - id: local-llm
    type: model
    task: chat-completion
    model: HuggingFaceTB/SmolLM3-3B
    action:
      messages:
        - role: user
          content: ${input.prompt}

Universal AI Service Integration

Connect to OpenAI, Anthropic, Google, xAI, ElevenLabs, and any custom HTTP API. Mix and match providers in a single workflow.

components:
  - id: claude
    type: http-client
    base_url: https://api.anthropic.com/v1
    action:
      path: /messages
      method: POST
      headers:
        x-api-key: ${env.ANTHROPIC_API_KEY}
        anthropic-version: "2023-06-01"
      body:
        model: claude-opus-4-20250514
        max_tokens: 1024
        messages:
          - role: user
            content: ${input.prompt}

Real-Time Streaming

Built-in SSE (Server-Sent Events) streaming for real-time AI responses. Stream from any provider or local model with automatic chunking and connection management.

workflows:
  - id: chat
    jobs:
      - component: chatgpt
        output: ${output as sse-text}

components:
  - id: chatgpt
    type: http-client
    base_url: https://api.openai.com/v1
    action:
      path: /chat/completions
      method: POST
      body:
        model: gpt-4o
        messages: ${input.messages}
        stream: true
      stream_format: json
      output: ${response[].choices[0].delta.content}

Built-in Data Store Integration

Native integration with Chroma, FAISS, Milvus, Qdrant for vector search. Neo4j and ArangoDB for graph stores. Redis for key-value storage. Build RAG systems with embedding search and semantic retrieval.

components:
  - id: knowledge
    type: vector-store
    driver: chroma
    actions:
      - id: insert
        collection: docs
        method: insert
        vector: ${input.vector}
        metadata:
          text: ${input.text}

      - id: search
        collection: docs
        method: search
        query: ${input.vector}

Deploy in Any Runtime

Run in native, process, Docker, or native container mode. The same configuration works across all runtimes — switch with one line.

controller:
  runtime:
    type: docker
    image: my-ai-service:latest
    ports:
      - "8080:8080"
  adapter:
    type: http-server
    port: 8080

Protocol Adapters

Serve over HTTP REST, WebSocket, or MCP (Model Context Protocol) by changing a single line. Includes concurrency control, health checks, and automatic API documentation.

# HTTP REST
controller:
  adapter:
    type: http-server
    port: 8080

# MCP (Model Context Protocol)
controller:
  adapter:
    type: mcp-server
    port: 8080

Distributed Workflow Execution

Scale AI workloads across multiple machines using Redis-backed queue dispatch. Add workers to scale horizontally without shared filesystem or code changes.

controller:
  adapter:
    type: http-server
    port: 8080
  queue:
    driver: redis
    host: localhost
    port: 6379
    name: my-queue

Webhook and Callback Listeners

HTTP callback listeners for async workflows and HTTP trigger listeners for webhooks. Build reactive AI systems that respond to real-world events.

listener:
  type: http-trigger
  port: 8091
  triggers:
    - path: /webhook
      method: POST
      workflow: handle-message
      input:
        text: ${body.message.text}

Gateway and Tunnel Support

Expose local services to the internet with ngrok, Cloudflare, or SSH tunnels. Integrate webhooks and deploy public APIs without complex networking.

gateway:
  type: http-tunnel
  driver: ngrok
  port:
    - 8090

Instant Web UI

Add a visual interface with 2 lines of YAML. Get a Gradio-powered chat UI or serve custom static frontends for testing and debugging.

controller:
  webui:
    driver: gradio
    port: 8081

Architecture

Protocol adapters → Composition engine → Runtime executors

Architecture Diagram


Contributing

We welcome all contributions! Whether it's fixing bugs, improving docs, or adding examples — every bit helps.

# Setup for development
git clone https://github.com/hanyeol/model-compose.git
cd model-compose
pip install -e .[dev]

License

MIT License © 2025-2026 Hanyeol Cho.


Contact

Have questions, ideas, or feedback? Open an issue or start a discussion on GitHub Discussions.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

model_compose-0.4.62.tar.gz (229.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

model_compose-0.4.62-py3-none-any.whl (444.6 kB view details)

Uploaded Python 3

File details

Details for the file model_compose-0.4.62.tar.gz.

File metadata

  • Download URL: model_compose-0.4.62.tar.gz
  • Upload date:
  • Size: 229.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for model_compose-0.4.62.tar.gz
Algorithm Hash digest
SHA256 ca7a27ef0d78151a2e3ea0dd3a58ae6b46b62a8e59d0ab674df56158e2bebb20
MD5 787cd0eabfda19ad9bdad2649ea60b3e
BLAKE2b-256 8d378ec4e77207247533d377d5c9d77156a4c0f2bee8469e0b9efa1bc25aa8c4

See more details on using hashes here.

File details

Details for the file model_compose-0.4.62-py3-none-any.whl.

File metadata

  • Download URL: model_compose-0.4.62-py3-none-any.whl
  • Upload date:
  • Size: 444.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for model_compose-0.4.62-py3-none-any.whl
Algorithm Hash digest
SHA256 77b7f8b4f863d8380c54f3bb2173d416b719c996fb59553666bfdbe6e8f9634f
MD5 742419db2926b4e3cdabf28d0f3abbaa
BLAKE2b-256 df5c6b44596f3820708c9d8e140ca32438c486c627ac4e4f8e15d77a6d53ba19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page