llamphouse

LLAMPHouse — Serving Your LLM Apps, Scalable and Reliable

Project description

LLAMPHouse

Serving Your LLM Apps, Scalable and Reliable.
Explore the docs »

Quickstart · Report Bug · Request Feature

What is LLAMPHouse?

LLAMPHouse is a self-hosted, production-ready server for LLM-powered applications. It exposes an OpenAI-compatible Assistants API and supports the A2A (Agent-to-Agent) protocol — so you can use the standard OpenAI Python SDK or any A2A client to talk to your agents.

[!NOTE] A2A protocol support requires LLAMPHouse v1.2.0 or later. Earlier versions only support the OpenAI Assistants API adapter.

Write your agent logic in plain Python, plug it into LLAMPHouse, and get:

🔌 OpenAI-compatible API — drop-in replacement, use the openai Python SDK as the client
🤝 A2A protocol — interoperable agent-to-agent communication out of the box
🌊 Streaming — real-time token streaming with SSE (works with OpenAI, Gemini, Anthropic)
🛠️ Tool calls — native support for function calling with automatic tool output handling
🔀 Multi-agent — call_agent() and handover_to_agent() for orchestration and delegation
📊 Compass dashboard — built-in dev UI for threads, messages, runs, traces, and agent flow visualization
🔍 OpenTelemetry tracing — automatic distributed tracing with ClickHouse storage
⚙️ Config store — runtime-tunable agent parameters via a dashboard UI
🐘 Pluggable storage — in-memory (default) or Postgres, with Alembic migrations
🐳 Docker-ready — single-command deployment with Postgres, Redis, and tracing

Why LLAMPHouse?

Most agent frameworks focus on building agents — LLAMPHouse focuses on serving them. Here's why that matters:

🚀 Scales from dev to production without rewrites

Start with a single Python file and an in-memory store. When you're ready for production, add Postgres, Redis, and distributed workers — same agent code, zero rewrites. LLAMPHouse grows with your project.

📦 Workload scaling, not agent scaling

Traditional setups tie one process to one agent. LLAMPHouse uses a shared agent pool with a run queue — multiple agents share the same infrastructure, and workers pull from a common queue. Scale by adding workers, not by duplicating services per agent.

🔧 Easily extensible and configurable

Swap out any component to fit your use case: data stores, queues, event queues, authentication, adapters, and workers are all pluggable interfaces. Need a custom auth layer? Implement BaseAuth. Want Redis queues? Drop in RedisQueue. The framework adapts to you, not the other way around.

🧩 LLM and framework agnostic

LLAMPHouse doesn't care what happens inside your run() method. Use OpenAI, Anthropic, Google Gemini, Azure AI, or any other provider. Build with LangChain, LangGraph, LlamaIndex, CrewAI, or plain API calls — LLAMPHouse serves the result, regardless of what generated it.

🌐 Standards-based interoperability

Expose your agents via the OpenAI Assistants API (works with the openai Python SDK out of the box) and the A2A protocol (Google's Agent-to-Agent standard). Your agents are instantly accessible to any compatible client or agent ecosystem.

Quickstart

Requirements: Python 3.10+

1. Install

pip install llamphouse

2. Create your agent

Create a file called server.py:

from llamphouse.core import LLAMPHouse, Agent
from llamphouse.core.context import Context
from llamphouse.core.data_stores.in_memory_store import InMemoryDataStore
from llamphouse.core.adapters.a2a import A2AAdapter


class HelloAgent(Agent):
    async def run(self, context: Context):
        await context.insert_message(
            "Hello! I'm a simple agent running on LLAMPHouse."
        )


agent = HelloAgent(
    id="hello-agent",
    name="Hello Agent",
    description="A friendly assistant that answers questions.",
    version="0.1.0",
)

app = LLAMPHouse(
    agents=[agent],
    data_store=InMemoryDataStore(),
    adapters=[A2AAdapter()],
)

app.ignite(host="127.0.0.1", port=8000)

3. Run it

python server.py

Your agent is now live at http://127.0.0.1:8000 with:

A2A protocol at /.well-known/agent.json
Compass dashboard at http://127.0.0.1:8000/compass

4. Talk to it

Use any A2A client, the OpenAI Python SDK, or just curl:

# Create a thread
curl -s -X POST http://127.0.0.1:8000/threads | python3 -m json.tool

# Send a message and create a run
THREAD_ID="<thread_id from above>"
curl -s -X POST "http://127.0.0.1:8000/threads/$THREAD_ID/messages" \
  -H "Content-Type: application/json" \
  -d '{"role": "user", "content": "Hi there!"}'

curl -s -X POST "http://127.0.0.1:8000/threads/$THREAD_ID/runs" \
  -H "Content-Type: application/json" \
  -d '{"assistant_id": "hello-agent"}'

Or use the OpenAI SDK as a client:

from openai import OpenAI

client = OpenAI(base_url="http://127.0.0.1:8000", api_key="any")

thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id, role="user", content="Hello!"
)
run = client.beta.threads.runs.create(
    thread_id=thread.id, assistant_id="hello-agent"
)

Adding an LLM

Connect to any LLM provider. Here's an example with OpenAI:

from dotenv import load_dotenv
from openai import AsyncOpenAI
from llamphouse.core import LLAMPHouse, Agent
from llamphouse.core.context import Context
from llamphouse.core.data_stores.in_memory_store import InMemoryDataStore
from llamphouse.core.adapters.a2a import A2AAdapter

load_dotenv()
openai_client = AsyncOpenAI()


class ChatAgent(Agent):
    async def run(self, context: Context):
        messages = [
            {"role": m.role, "content": m.text}
            for m in context.messages
        ]
        result = await openai_client.chat.completions.create(
            messages=messages, model="gpt-4o-mini",
        )
        await context.insert_message(result.choices[0].message.content)


app = LLAMPHouse(
    agents=[ChatAgent(
        id="chat", name="Chat Agent",
        description="Chat with GPT", version="0.1.0",
    )],
    data_store=InMemoryDataStore(),
    adapters=[A2AAdapter()],
)
app.ignite(host="127.0.0.1", port=8000)

Key Concepts

Agent

An agent is a Python class that subclasses Agent and implements a run() method. This is where your logic lives — call LLMs, use tools, delegate to other agents, or do anything you need.

class MyAgent(Agent):
    async def run(self, context: Context):
        # context.messages — the conversation history
        # context.insert_message("...") — send a reply
        # context.send_chunk("...") — stream a token
        # context.call_agent("other-agent", "question") — call another agent
        # context.handover_to_agent("other-agent", "question") — hand off entirely
        # context.get_config("param_name") — read runtime config
        pass

Context

The Context object is passed to every run() call and provides the full toolkit:

Method	Description
`context.messages`	Conversation history for the current thread
`context.insert_message(text)`	Insert an assistant reply
`context.send_chunk(text)`	Stream a text chunk to the client
`await context.call_agent(agent_id, message)`	Call another agent, returns an async generator of chunks
`await context.handover_to_agent(agent_id, message)`	Hand off the conversation to another agent
`context.get_config(key)`	Read a runtime config parameter
`context.submit_tool_outputs(run_id, outputs)`	Submit tool call results back to a run

Adapters

Adapters control how clients communicate with your agents:

Adapter	Protocol	Use case
`A2AAdapter`	A2A (Agent-to-Agent)	Interoperable agent communication
`AssistantAPIAdapter`	OpenAI Assistants API	OpenAI SDK compatibility

Both can be used simultaneously. If no adapters are specified, AssistantAPIAdapter is used by default.

Multi-Agent

LLAMPHouse supports multiple agents in a single server. Agents can call each other directly — no HTTP overhead:

# In an orchestrator agent's run() method:

# Option 1: Call another agent and forward its response chunks
async for chunk in await context.call_agent("researcher", "Find info about X"):
    context.send_chunk(chunk)

# Option 2: Hand off the entire conversation
await context.handover_to_agent("specialist", "Handle this request")

Configuration

LLAMPHouse constructor

LLAMPHouse(
    agents=[...],                  # List of Agent instances
    adapters=[A2AAdapter()],       # Protocol adapters (default: AssistantAPIAdapter)
    data_store=InMemoryDataStore(),# In-memory (default) or PostgresDataStore
    authenticator=KeyAuth("key"),  # Optional API key auth
    config_store=None,             # Optional runtime config store
    retention_policy=None,         # Optional data retention/purge policy
    exclude_spans=["pattern.*"],   # Optional tracing span exclusions
    compass=True,                  # Enable/disable Compass dashboard (default: True)
)

Environment variables

Variable	Description	Default
`DATABASE_URL`	Postgres connection string	(in-memory if unset)
`REDIS_URL`	Redis URL for queues	(in-memory if unset)
`LLAMPHOUSE_TRACING_ENABLED`	Enable OpenTelemetry tracing	`true`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OTLP collector endpoint	(none)
`OTEL_SERVICE_NAME`	Service name for traces	`llamphouse`
`CLICKHOUSE_URL`	ClickHouse URL for Compass traces view	(none)

Examples

The examples/ directory contains runnable samples for every feature:

Example	Description
01_HelloWorld	Minimal agent — no LLM needed
02_Chat	OpenAI-powered conversational agent
03_Streaming	Real-time token streaming with SSE
04_ToolCall	Function calling with tool schemas
06_GeminiStreaming	Streaming with Google Gemini
08_Tracing	OpenTelemetry distributed tracing
09_A2A	A2A protocol agent
10_A2A_ToolCall	A2A with tool calls
11_AgentHandover	Multi-agent handover
12_CentralOrchestrator	Central orchestrator pattern
13_ConfigStore	Runtime-tunable agent config
14_DistributedWorker	Separate API and worker processes
15_A2A_AIFoundry	A2A with Azure AI Foundry
LangGraph	LangGraph integration

Each example includes a server.py, client.py, and README.md with instructions.

Docker Deployment

A Docker Compose setup is included for production deployments with Postgres, Redis, OpenTelemetry, and ClickHouse:

cd docker
docker compose up -d

This starts:

Service	Port	Purpose
Runtime	`8080`	Your agent server
Postgres	`5432`	Persistent data store
Redis	`6379`	Run queue and event queue
OTel Collector	`4318`	Trace collection
ClickHouse	`8123`	Trace storage for Compass

For split-mode deployments (separate API and worker processes), see docker-compose.prod.yml.

Development

Setup

git clone https://github.com/llamp-ai/llamphouse.git
cd llamphouse
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Testing

# Run all tests (unit + contract + integration)
python -m pytest tests/ -v

# Postgres-only tests (requires DATABASE_URL)
python -m pytest -m postgres

Database Migrations (Postgres only)

LLAMPHouse uses Alembic for schema migrations:

# Start a local Postgres
docker run --rm -d --name postgres \
  -e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \
  -p 5432:5432 postgres
docker exec -it postgres psql -U postgres -c 'CREATE DATABASE llamphouse;'

# Apply migrations
alembic upgrade head

# Create a new migration
alembic revision --autogenerate -m "description"

# Roll back
alembic downgrade base

Building

python -m build

API Compatibility

LLAMPHouse implements the OpenAI Assistants API v2:

Endpoint	Status
Assistants — List, Retrieve	✅
Assistants — Create, Modify, Delete	By design: agents are defined in code
Threads — Create, Retrieve, Modify, Delete	✅
Messages — Create, List, Retrieve, Modify, Delete	✅
Runs — Create, Create thread & run, List, Retrieve, Modify, Cancel, Submit tool outputs	✅
Run Steps — List, Retrieve	✅
Streaming — Message delta, Run step, Assistant stream	✅
Vector Stores	Not yet implemented

Contributing

Contributions are welcome! If you have a suggestion, please fork the repo and create a pull request, or open an issue with the tag "enhancement".

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

Top contributors:

License

See LICENSE for more information.

Contact

Project Admin: Pieter van der Deen — pieter@stack-wise.co.uk

Project details

Release history Release notifications | RSS feed

1.2.3

May 12, 2026

1.2.2

Mar 19, 2026

This version

1.2.1

Mar 19, 2026

1.2.0

Mar 17, 2026

1.1.0

Feb 20, 2026

1.0.1

Jan 30, 2026

1.0.0

Jan 15, 2026

0.0.8

Aug 14, 2025

0.0.7

Feb 27, 2025

0.0.6

Feb 6, 2025

0.0.5

Jan 30, 2025

0.0.4

Jan 23, 2025

0.0.3

Jan 14, 2025

0.0.2

Jan 13, 2025

0.0.1

Jan 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llamphouse-1.2.1.tar.gz (1.6 MB view details)

Uploaded Mar 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llamphouse-1.2.1-py3-none-any.whl (252.6 kB view details)

Uploaded Mar 19, 2026 Python 3

File details

Details for the file llamphouse-1.2.1.tar.gz.

File metadata

Download URL: llamphouse-1.2.1.tar.gz
Upload date: Mar 19, 2026
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for llamphouse-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`b663478c4f19d5b8eb43d0026c99427a96e05055be7b8c7c4397a598a2a88187`
MD5	`6cefd463de84e9f12baa3cb3953e4899`
BLAKE2b-256	`feee4fcfff3679d8f1307f8cd45f000deada0f77b0e57cebef5f7690592e475f`

See more details on using hashes here.

File details

Details for the file llamphouse-1.2.1-py3-none-any.whl.

File metadata

Download URL: llamphouse-1.2.1-py3-none-any.whl
Upload date: Mar 19, 2026
Size: 252.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for llamphouse-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5a431ce0929be7527f1dc06eb0bd02690c50657c5a62d7565fb50b1de1983ee`
MD5	`554e9e27185a7ead53c293bc935331bd`
BLAKE2b-256	`fcbc1de4c7cf992d7d10e02f4d4da4ed2b3379d395b575e6240dea7a009e5c53`

See more details on using hashes here.

llamphouse 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

LLAMPHouse

What is LLAMPHouse?

Why LLAMPHouse?

🚀 Scales from dev to production without rewrites

📦 Workload scaling, not agent scaling

🔧 Easily extensible and configurable

🧩 LLM and framework agnostic

🌐 Standards-based interoperability

Quickstart

1. Install

2. Create your agent

3. Run it

4. Talk to it

Adding an LLM

Key Concepts

Agent

Context

Adapters

Multi-Agent

Configuration

LLAMPHouse constructor

Environment variables

Examples

Docker Deployment

Development

Setup

Testing

Database Migrations (Postgres only)

Building

API Compatibility

Contributing

Top contributors:

License

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes