LLAMPHouse — Serving Your LLM Apps, Scalable and Reliable
Project description
LLAMPHouse
Serving Your LLM Apps, Scalable and Reliable.
Explore the docs »
Quickstart
·
Report Bug
·
Request Feature
What is LLAMPHouse?
LLAMPHouse is a self-hosted, production-ready server for LLM-powered applications. It exposes an OpenAI-compatible Assistants API and supports the A2A (Agent-to-Agent) protocol — so you can use the standard OpenAI Python SDK or any A2A client to talk to your agents.
[!NOTE] A2A protocol support requires LLAMPHouse v1.2.0 or later. Earlier versions only support the OpenAI Assistants API adapter.
Write your agent logic in plain Python, plug it into LLAMPHouse, and get:
- 🔌 OpenAI-compatible API — drop-in replacement, use the
openaiPython SDK as the client - 🤝 A2A protocol — interoperable agent-to-agent communication out of the box
- 🌊 Streaming — real-time token streaming with SSE (works with OpenAI, Gemini, Anthropic)
- 🛠️ Tool calls — native support for function calling with automatic tool output handling
- 🔀 Multi-agent —
call_agent()andhandover_to_agent()for orchestration and delegation - 📊 Compass dashboard — built-in dev UI for threads, messages, runs, traces, and agent flow visualization
- 🔍 OpenTelemetry tracing — automatic distributed tracing with ClickHouse storage
- ⚙️ Config store — runtime-tunable agent parameters via a dashboard UI
- 🐘 Pluggable storage — in-memory (default) or Postgres, with Alembic migrations
- 🐳 Docker-ready — single-command deployment with Postgres, Redis, and tracing
Why LLAMPHouse?
Most agent frameworks focus on building agents — LLAMPHouse focuses on serving them. Here's why that matters:
🚀 Scales from dev to production without rewrites
Start with a single Python file and an in-memory store. When you're ready for production, add Postgres, Redis, and distributed workers — same agent code, zero rewrites. LLAMPHouse grows with your project.
📦 Workload scaling, not agent scaling
Traditional setups tie one process to one agent. LLAMPHouse uses a shared agent pool with a run queue — multiple agents share the same infrastructure, and workers pull from a common queue. Scale by adding workers, not by duplicating services per agent.
🔧 Easily extensible and configurable
Swap out any component to fit your use case: data stores, queues, event queues, authentication, adapters, and workers are all pluggable interfaces. Need a custom auth layer? Implement BaseAuth. Want Redis queues? Drop in RedisQueue. The framework adapts to you, not the other way around.
🧩 LLM and framework agnostic
LLAMPHouse doesn't care what happens inside your run() method. Use OpenAI, Anthropic, Google Gemini, Azure AI, or any other provider. Build with LangChain, LangGraph, LlamaIndex, CrewAI, or plain API calls — LLAMPHouse serves the result, regardless of what generated it.
🌐 Standards-based interoperability
Expose your agents via the OpenAI Assistants API (works with the openai Python SDK out of the box) and the A2A protocol (Google's Agent-to-Agent standard). Your agents are instantly accessible to any compatible client or agent ecosystem.
Quickstart
Requirements: Python 3.10+
1. Install
pip install llamphouse
2. Create your agent
Create a file called server.py:
from llamphouse.core import LLAMPHouse, Agent
from llamphouse.core.context import Context
from llamphouse.core.data_stores.in_memory_store import InMemoryDataStore
from llamphouse.core.adapters.a2a import A2AAdapter
class HelloAgent(Agent):
async def run(self, context: Context):
await context.insert_message(
"Hello! I'm a simple agent running on LLAMPHouse."
)
agent = HelloAgent(
id="hello-agent",
name="Hello Agent",
description="A friendly assistant that answers questions.",
version="0.1.0",
)
app = LLAMPHouse(
agents=[agent],
data_store=InMemoryDataStore(),
adapters=[A2AAdapter()],
)
app.ignite(host="127.0.0.1", port=8000)
3. Run it
python server.py
Your agent is now live at http://127.0.0.1:8000 with:
- A2A protocol at
/.well-known/agent.json - Compass dashboard at
http://127.0.0.1:8000/compass
4. Talk to it
Use any A2A client, the OpenAI Python SDK, or just curl:
# Create a thread
curl -s -X POST http://127.0.0.1:8000/threads | python3 -m json.tool
# Send a message and create a run
THREAD_ID="<thread_id from above>"
curl -s -X POST "http://127.0.0.1:8000/threads/$THREAD_ID/messages" \
-H "Content-Type: application/json" \
-d '{"role": "user", "content": "Hi there!"}'
curl -s -X POST "http://127.0.0.1:8000/threads/$THREAD_ID/runs" \
-H "Content-Type: application/json" \
-d '{"assistant_id": "hello-agent"}'
Or use the OpenAI SDK as a client:
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8000", api_key="any")
thread = client.beta.threads.create()
client.beta.threads.messages.create(
thread_id=thread.id, role="user", content="Hello!"
)
run = client.beta.threads.runs.create(
thread_id=thread.id, assistant_id="hello-agent"
)
Adding an LLM
Connect to any LLM provider. Here's an example with OpenAI:
from dotenv import load_dotenv
from openai import AsyncOpenAI
from llamphouse.core import LLAMPHouse, Agent
from llamphouse.core.context import Context
from llamphouse.core.data_stores.in_memory_store import InMemoryDataStore
from llamphouse.core.adapters.a2a import A2AAdapter
load_dotenv()
openai_client = AsyncOpenAI()
class ChatAgent(Agent):
async def run(self, context: Context):
messages = [
{"role": m.role, "content": m.text}
for m in context.messages
]
result = await openai_client.chat.completions.create(
messages=messages, model="gpt-4o-mini",
)
await context.insert_message(result.choices[0].message.content)
app = LLAMPHouse(
agents=[ChatAgent(
id="chat", name="Chat Agent",
description="Chat with GPT", version="0.1.0",
)],
data_store=InMemoryDataStore(),
adapters=[A2AAdapter()],
)
app.ignite(host="127.0.0.1", port=8000)
Key Concepts
Agent
An agent is a Python class that subclasses Agent and implements a run() method. This is where your logic lives — call LLMs, use tools, delegate to other agents, or do anything you need.
class MyAgent(Agent):
async def run(self, context: Context):
# context.messages — the conversation history
# context.insert_message("...") — send a reply
# context.send_chunk("...") — stream a token
# context.call_agent("other-agent", "question") — call another agent
# context.handover_to_agent("other-agent", "question") — hand off entirely
# context.get_config("param_name") — read runtime config
pass
Context
The Context object is passed to every run() call and provides the full toolkit:
| Method | Description |
|---|---|
context.messages |
Conversation history for the current thread |
context.insert_message(text) |
Insert an assistant reply |
context.send_chunk(text) |
Stream a text chunk to the client |
await context.call_agent(agent_id, message) |
Call another agent, returns an async generator of chunks |
await context.handover_to_agent(agent_id, message) |
Hand off the conversation to another agent |
context.get_config(key) |
Read a runtime config parameter |
context.submit_tool_outputs(run_id, outputs) |
Submit tool call results back to a run |
Adapters
Adapters control how clients communicate with your agents:
| Adapter | Protocol | Use case |
|---|---|---|
A2AAdapter |
A2A (Agent-to-Agent) | Interoperable agent communication |
AssistantAPIAdapter |
OpenAI Assistants API | OpenAI SDK compatibility |
Both can be used simultaneously. If no adapters are specified, AssistantAPIAdapter is used by default.
Multi-Agent
LLAMPHouse supports multiple agents in a single server. Agents can call each other directly — no HTTP overhead:
# In an orchestrator agent's run() method:
# Option 1: Call another agent and forward its response chunks
async for chunk in await context.call_agent("researcher", "Find info about X"):
context.send_chunk(chunk)
# Option 2: Hand off the entire conversation
await context.handover_to_agent("specialist", "Handle this request")
Configuration
LLAMPHouse constructor
LLAMPHouse(
agents=[...], # List of Agent instances
adapters=[A2AAdapter()], # Protocol adapters (default: AssistantAPIAdapter)
data_store=InMemoryDataStore(),# In-memory (default) or PostgresDataStore
authenticator=KeyAuth("key"), # Optional API key auth
config_store=None, # Optional runtime config store
retention_policy=None, # Optional data retention/purge policy
exclude_spans=["pattern.*"], # Optional tracing span exclusions
compass=True, # Enable/disable Compass dashboard (default: True)
)
Environment variables
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
Postgres connection string | (in-memory if unset) |
REDIS_URL |
Redis URL for queues | (in-memory if unset) |
LLAMPHOUSE_TRACING_ENABLED |
Enable OpenTelemetry tracing | true |
OTEL_EXPORTER_OTLP_ENDPOINT |
OTLP collector endpoint | (none) |
OTEL_SERVICE_NAME |
Service name for traces | llamphouse |
CLICKHOUSE_URL |
ClickHouse URL for Compass traces view | (none) |
Examples
The examples/ directory contains runnable samples for every feature:
| Example | Description |
|---|---|
| 01_HelloWorld | Minimal agent — no LLM needed |
| 02_Chat | OpenAI-powered conversational agent |
| 03_Streaming | Real-time token streaming with SSE |
| 04_ToolCall | Function calling with tool schemas |
| 06_GeminiStreaming | Streaming with Google Gemini |
| 08_Tracing | OpenTelemetry distributed tracing |
| 09_A2A | A2A protocol agent |
| 10_A2A_ToolCall | A2A with tool calls |
| 11_AgentHandover | Multi-agent handover |
| 12_CentralOrchestrator | Central orchestrator pattern |
| 13_ConfigStore | Runtime-tunable agent config |
| 14_DistributedWorker | Separate API and worker processes |
| 15_A2A_AIFoundry | A2A with Azure AI Foundry |
| LangGraph | LangGraph integration |
Each example includes a server.py, client.py, and README.md with instructions.
Docker Deployment
A Docker Compose setup is included for production deployments with Postgres, Redis, OpenTelemetry, and ClickHouse:
cd docker
docker compose up -d
This starts:
| Service | Port | Purpose |
|---|---|---|
| Runtime | 8080 |
Your agent server |
| Postgres | 5432 |
Persistent data store |
| Redis | 6379 |
Run queue and event queue |
| OTel Collector | 4318 |
Trace collection |
| ClickHouse | 8123 |
Trace storage for Compass |
For split-mode deployments (separate API and worker processes), see docker-compose.prod.yml.
Development
Setup
git clone https://github.com/llamp-ai/llamphouse.git
cd llamphouse
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
Testing
# Run all tests (unit + contract + integration)
python -m pytest tests/ -v
# Postgres-only tests (requires DATABASE_URL)
python -m pytest -m postgres
Database Migrations (Postgres only)
LLAMPHouse uses Alembic for schema migrations:
# Start a local Postgres
docker run --rm -d --name postgres \
-e POSTGRES_USER=postgres -e POSTGRES_PASSWORD=password \
-p 5432:5432 postgres
docker exec -it postgres psql -U postgres -c 'CREATE DATABASE llamphouse;'
# Apply migrations
alembic upgrade head
# Create a new migration
alembic revision --autogenerate -m "description"
# Roll back
alembic downgrade base
Building
python -m build
API Compatibility
LLAMPHouse implements the OpenAI Assistants API v2:
| Endpoint | Status |
|---|---|
| Assistants — List, Retrieve | ✅ |
| Assistants — Create, Modify, Delete | By design: agents are defined in code |
| Threads — Create, Retrieve, Modify, Delete | ✅ |
| Messages — Create, List, Retrieve, Modify, Delete | ✅ |
| Runs — Create, Create thread & run, List, Retrieve, Modify, Cancel, Submit tool outputs | ✅ |
| Run Steps — List, Retrieve | ✅ |
| Streaming — Message delta, Run step, Assistant stream | ✅ |
| Vector Stores | Not yet implemented |
Contributing
Contributions are welcome! If you have a suggestion, please fork the repo and create a pull request, or open an issue with the tag "enhancement".
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature) - Commit your Changes (
git commit -m 'Add some AmazingFeature') - Push to the Branch (
git push origin feature/AmazingFeature) - Open a Pull Request
Top contributors:
License
See LICENSE for more information.
Contact
Project Admin: Pieter van der Deen — pieter@stack-wise.co.uk
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llamphouse-1.2.2.tar.gz.
File metadata
- Download URL: llamphouse-1.2.2.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1382dfa54efc0b9da40983ac48581c49453591605ffb9c68178e2ab3f722d95
|
|
| MD5 |
45f30767dfd319bbad7a993aa3b7e676
|
|
| BLAKE2b-256 |
155f038bf9c5ee8a81d92f3f9dade837f08f9fabc2b2ec4a41b66a08d460e260
|
File details
Details for the file llamphouse-1.2.2-py3-none-any.whl.
File metadata
- Download URL: llamphouse-1.2.2-py3-none-any.whl
- Upload date:
- Size: 252.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19eb7ca2829bc37fea9dc94c81d2c9987376c3b08b5af0990ef5c302789c82a4
|
|
| MD5 |
ccb29706a04017d0ca3c1f8e941724e8
|
|
| BLAKE2b-256 |
4d0caaa8d1e2f765bf20ab0e05aad4caf0e409125f3a41ca50de6e38e7dd4f0b
|