An agentic concierge that spawns ephemeral specialist teams to tackle tasks — capability-based routing, multi-pack task forces, and native tool-calling on local or cloud LLM endpoints.
Project description
agentic-concierge
A quality-first agent orchestration framework for local LLM inference.
- Router + Supervisor decomposes tasks into capabilities, recruits the right specialist packs, and runs them.
- Specialist packs are modular and composable: engineering, research, enterprise research — or add your own.
- Local-first: Ollama is the default and primary backend; any OpenAI-compatible server works via config.
- Extensible via MCP: connect GitHub, Confluence, Jira, filesystem, and other tool servers with a single config entry — no custom Python required.
- Observable: structured runlogs, persistent cross-run index, real-time SSE streaming, OpenTelemetry traces.
Key features
| Feature | Details |
|---|---|
| Specialist packs | Engineering (shell, file I/O, test, deploy-propose-only), Research (web search, fetch, citations), Enterprise Research (GitHub/Confluence/Jira via MCP, cross-run memory search) |
| Task decomposition | Prompt → capability IDs → recruit the right pack(s) automatically |
| Task forces | Multiple packs run sequentially (with context handoff) or in parallel (asyncio.gather) for a single task |
| MCP tool servers | stdio or SSE MCP servers attached per specialist via config; tools merged transparently |
| Cloud fallback | Local model tried first; cloud model used when local fails a quality bar (no tool calls, malformed args) |
| Podman isolation | Optional: wrap any pack with ContainerisedSpecialistPack by setting container_image in config |
| Semantic run index | Every run is indexed; past runs are searchable by keyword or embedding similarity (concierge logs search) |
| Real-time streaming | POST /run/stream streams all run events as Server-Sent Events |
| Run status | GET /runs/{run_id}/status returns running / completed without reading the full runlog |
| OpenTelemetry | Optional [otel] dep; fabric.execute_task, fabric.llm_call, fabric.tool_call spans |
Installation
Quick install — Linux binary (recommended for end users)
curl -fsSL https://raw.githubusercontent.com/ausmarton/agentic-concierge/main/install.sh | sh
Downloads a static musl binary (~5 MB) to ~/.local/bin/concierge.
Supports x86_64 and aarch64 Linux. No Python, pip, or package manager required.
On first run the launcher:
- Detects or downloads Python 3.12 via
uv - Creates a managed venv at
~/.local/share/agentic-concierge/venv/ - Installs
agentic-conciergefrom PyPI - Exec-replaces itself with the Python binary (correct PID, transparent signal forwarding)
Keep the launcher up to date:
concierge --self-update
Install to a custom directory (e.g. for system-wide install):
CONCIERGE_INSTALL_DIR=/usr/local/bin \
curl -fsSL https://raw.githubusercontent.com/ausmarton/agentic-concierge/main/install.sh | sh
From PyPI (developers / non-Linux)
pip install agentic-concierge
Install optional extras:
pip install "agentic-concierge[otel]" # OpenTelemetry tracing
pip install "agentic-concierge[mcp]" # MCP tool server support
Docker (batteries-included: Ollama + agentic-concierge)
# Clone the repo for the config and docker-compose file
git clone https://github.com/ausmarton/agentic-concierge.git
cd agentic-concierge
# Start Ollama + agentic-concierge (pulls qwen2.5:7b on first run)
docker compose up -d
# Run a task
curl -X POST http://localhost:8080/run \
-H "Content-Type: application/json" \
-d '{"prompt": "Create a file hello.txt with content Hello World", "pack": "engineering"}'
The docker-compose.yml includes an Ollama service with a health check, an agentic-concierge service, and a one-shot model-pull service that exits after pulling qwen2.5:7b.
To use a different model, edit examples/ollama.json and re-mount it via CONCIERGE_CONFIG_PATH.
From source
git clone https://github.com/ausmarton/agentic-concierge.git
cd agentic-concierge
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
Quick start (local Ollama)
1. System dependencies
sudo dnf install -y python3 python3-devel gcc gcc-c++ make cmake git ripgrep jq
2. Install and start Ollama
# Install (pick one)
curl -fsSL https://ollama.com/install.sh | sh # official script
# OR: sudo dnf install -y ollama # Fedora package
# Start (if not already running as a service)
ollama serve
Pull a model (agentic-concierge auto-pulls qwen2.5:7b if no chat model is found, but pre-pulling is faster):
ollama pull qwen2.5:7b # fast model (default)
ollama pull qwen2.5:14b # quality model (optional)
Any other model works — set CONCIERGE_CONFIG_PATH to point at a config with your preferred model name.
3. Install agentic-concierge
pip install agentic-concierge
# or from source:
# cd /path/to/agentic-concierge && pip install -e .
4. Run
# Quick smoke test — creates a file and lists the workspace
concierge run "Create a file hello.txt with content Hello World, then list the workspace." --pack engineering
Stream events as they happen with --stream (shows tool calls, LLM steps, results in real-time):
concierge run "Build a Flask /health endpoint with a test" --pack engineering --stream
You should see a run directory path and JSON with "action": "final". Check:
.concierge/runs/<run_id>/workspace/hello.txt— artifact.concierge/runs/<run_id>/runlog.jsonl— structured event log (tool calls, LLM responses, etc.)
CLI reference
concierge run PROMPT [OPTIONS]
Run a task using a specialist pack.
Options:
--pack TEXT Specialist ID (e.g. engineering, research).
Omit to let the router pick based on capabilities.
--model-key TEXT Which model entry to use from config [default: quality]
--network-allowed / --no-network-allowed
Allow web tools (web_search, fetch_url) [default: enabled]
--stream / -s Stream run events to the terminal as they happen.
--verbose Enable DEBUG logging
concierge serve [OPTIONS]
Start the HTTP API server.
Options:
--host TEXT [default: 127.0.0.1]
--port INT [default: 8787]
concierge logs list [OPTIONS]
List past runs (most recent first).
Options:
--workspace PATH [default: .concierge]
--limit N [default: 20]
concierge logs show RUN_ID [OPTIONS]
Pretty-print runlog events for a run.
Options:
--workspace PATH
--kinds TEXT Comma-separated event kinds to filter
(e.g. tool_call,tool_result)
concierge logs search QUERY [OPTIONS]
Search the cross-run index.
Uses semantic similarity when embedding_model is configured;
falls back to keyword/substring matching otherwise.
Options:
--workspace PATH
--limit N [default: 10]
HTTP API
Start the server:
concierge serve
# or: uvicorn agentic_concierge.interfaces.http_api:app --host 0.0.0.0 --port 8787
GET /health
{"ok": true}
POST /run — blocking run
curl -X POST http://127.0.0.1:8787/run \
-H "Content-Type: application/json" \
-d '{"prompt": "Create ok.txt with content OK", "pack": "engineering"}'
Request body:
{
"prompt": "your task",
"pack": "engineering", // optional; omit to auto-route
"model_key": "quality", // optional; default "quality"
"network_allowed": true // optional; default true
}
Response: the finish_task payload merged with a _meta field containing run_id, specialist_ids, workspace, model, etc.
POST /run/stream — Server-Sent Events
Streams run events in real-time as they happen:
curl -N -X POST http://127.0.0.1:8787/run/stream \
-H "Content-Type: application/json" \
-d '{"prompt": "Create ok.txt with content OK", "pack": "engineering"}'
Each event is a data: <json>\n\n SSE line. Event kinds:
| Kind | When |
|---|---|
recruitment |
Specialist(s) selected |
llm_request |
Before each LLM call |
llm_response |
After each LLM call |
tool_call |
Before each tool execution |
tool_result |
Successful tool result |
tool_error |
Tool raised an exception |
security_event |
Sandbox violation (path escape, disallowed command) |
cloud_fallback |
Local model fell back to cloud |
pack_start |
A specialist pack started (task forces) |
run_complete |
Run finished successfully |
_run_done_ |
Terminal sentinel — stream ends |
_run_error_ |
Terminal sentinel — run failed |
Rate limiting
When CONCIERGE_RATE_LIMIT is set to a positive integer, the API enforces a per-IP sliding-window rate limit (requests per minute). GET /health is always exempt. Excess requests receive 429 Too Many Requests with a Retry-After header:
export CONCIERGE_RATE_LIMIT=60 # 60 requests per minute per IP (default: no limit)
concierge serve
API key authentication
When CONCIERGE_API_KEY is set, every endpoint except GET /health requires an Authorization: Bearer <key> header:
export CONCIERGE_API_KEY="your-strong-secret"
concierge serve
# Include the header in every request:
curl -X POST http://127.0.0.1:8787/run \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-strong-secret" \
-d '{"prompt": "hello"}'
Leave CONCIERGE_API_KEY unset (default) to disable authentication — suitable for local use. Uses constant-time comparison (hmac.compare_digest) to prevent timing attacks.
GET /runs/{run_id}/status
curl http://127.0.0.1:8787/runs/abc123.../status
{"status": "completed", "run_id": "abc123...", "specialist_ids": ["engineering"], "task_force_mode": "sequential"}
Status values: running, completed. Returns HTTP 404 if the run ID is not found.
Configuration
Set CONCIERGE_CONFIG_PATH to a JSON or YAML file to override the defaults.
export CONCIERGE_CONFIG_PATH=/path/to/your/config.json
The default config uses Ollama at localhost:11434 with qwen2.5:7b (fast) and qwen2.5:14b (quality). Copy examples/ollama.json as a starting point.
Key config fields
{
"models": {
"fast": {"base_url": "http://localhost:11434/v1", "model": "qwen2.5:7b", "temperature": 0.1, "max_tokens": 1200},
"quality": {"base_url": "http://localhost:11434/v1", "model": "qwen2.5:14b", "temperature": 0.1, "max_tokens": 2400}
},
"specialists": {
"engineering": {
"description": "Plan → implement → test → review → iterate.",
"keywords": ["build", "implement", "code", "python"],
"workflow": "engineering",
"capabilities": ["code_execution", "file_io", "software_testing"]
}
},
"routing_model_key": "fast", // model used for LLM-based routing
"task_force_mode": "sequential", // "sequential" (default) or "parallel"
"local_llm_ensure_available": true, // start Ollama if unreachable
"local_llm_start_cmd": ["ollama", "serve"],
"auto_pull_if_missing": true, // pull qwen2.5:7b when no model exists
"auto_pull_model": "qwen2.5:7b",
"run_index": {
"embedding_model": "nomic-embed-text" // enables semantic search; omit for keyword-only
},
"cloud_fallback": {
"model_key": "cloud_quality", // must exist in "models"
"policy": "no_tool_calls" // trigger: "no_tool_calls" | "malformed_args" | "always"
},
"telemetry": {
"enabled": true,
"exporter": "otlp",
"otlp_endpoint": "http://localhost:4317"
}
}
Using a non-Ollama backend
Any OpenAI-compatible endpoint works. Set backend: "generic" for cloud/vLLM/LiteLLM servers (skips Ollama-specific 400 retry logic):
"models": {
"quality": {
"base_url": "https://api.openai.com/v1",
"model": "gpt-4o",
"api_key": "sk-...",
"backend": "generic"
}
}
Set local_llm_ensure_available: false when you manage the server yourself (CI, cloud deployments, etc.).
MCP tool servers
Attach any MCP server to a specialist pack — no Python code required:
"specialists": {
"engineering": {
"description": "Engineering with GitHub access.",
"workflow": "engineering",
"capabilities": ["code_execution", "file_io", "github_search"],
"mcp_servers": [
{
"name": "github",
"transport": "stdio",
"command": "npx",
"args": ["--yes", "--", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_TOKEN": "${GITHUB_TOKEN}"}
}
]
}
}
Tools are auto-discovered at startup and prefixed mcp__github__<tool>. See docs/MCP_INTEGRATIONS.md for GitHub, Confluence, Jira, and filesystem examples.
Parallel task forces
Run multiple specialists concurrently for independent sub-tasks:
"task_force_mode": "parallel"
In sequential mode (default) each pack receives the previous pack's output as context. In parallel mode all packs run concurrently via asyncio.gather and results are merged.
Podman container isolation
"specialists": {
"engineering": {
"container_image": "python:3.12-slim"
}
}
All shell tool calls execute inside an isolated Podman container with the workspace mounted at /workspace. Requires Podman installed and the image available locally.
Specialist packs
Built-in packs
| ID | Description | Tools |
|---|---|---|
engineering |
Plan → implement → test → review | shell, read_file, write_file, list_files, finish_task |
research |
Scope → search → screen → extract → synthesize | web_search, fetch_url, read_file, write_file, list_files, finish_task |
enterprise_research |
GitHub/Confluence/Jira search + cross-run memory | All research tools + cross_run_search + any configured MCP tools |
* Requires network_allowed: true (default).
Adding a custom pack
Option A — config-driven (no core change required):
# mypackage/packs.py
from agentic_concierge.infrastructure.specialists.base import BaseSpecialistPack
from agentic_concierge.infrastructure.specialists.tool_defs import make_tool_def, make_finish_tool_def
def build_my_pack(workspace_path: str, network_allowed: bool):
tools = {
"my_tool": lambda args: {"result": "..."},
}
tool_definitions = [
make_tool_def("my_tool", "Does something useful.", {"type": "object", "properties": {...}, "required": [...]}),
make_finish_tool_def(),
]
return BaseSpecialistPack(
specialist_id="my_specialist",
system_prompt="You are a ...",
tool_map=tools,
tool_definitions=tool_definitions,
workspace_path=workspace_path,
)
"specialists": {
"my_specialist": {
"description": "My custom specialist.",
"workflow": "my_specialist",
"builder": "mypackage.packs:build_my_pack",
"capabilities": ["my_capability"]
}
}
Option B — built-in: add your pack factory to infrastructure/specialists/, register in _DEFAULT_BUILDERS in registry.py, and add an entry to DEFAULT_CONFIG. See docs/ARCHITECTURE.md §5 for the full extension guide.
Runlog
Every run produces .concierge/runs/<run_id>/runlog.jsonl. Each line:
{"ts": 1708800000.123, "kind": "tool_call", "step": "step_0", "payload": {"tool": "shell", "args": {"cmd": "ls"}}}
Inspect with:
concierge logs show <run_id>
concierge logs show <run_id> --kinds tool_call,tool_result
Testing
Fast CI (no LLM required, ~4 seconds, 1377+ tests):
pip install -e ".[dev]"
pytest tests/ -k "not real_llm and not real_mcp and not podman" -q
Full validation (requires Ollama + a pulled model):
python scripts/validate_full.py
Ensures the LLM is reachable (starts it if needed via config), then runs all tests including real-LLM E2E tests. Use ollama pull qwen2.5:7b or set CONCIERGE_CONFIG_PATH to a config with a model you have.
Single E2E check:
python scripts/verify_working_real.py
Runs one engineering task end-to-end and asserts that tool_call/tool_result events exist and workspace artifacts are created.
Test markers:
| Marker | Meaning |
|---|---|
real_llm |
Requires a live Ollama instance |
real_mcp |
Requires npx and an MCP server package |
podman |
Requires Podman and a pulled container image |
Development
See CONTRIBUTING.md for the full contributor guide.
# Install dev dependencies (includes mcp, pytest, pytest-asyncio)
pip install -e ".[dev]"
# Optional: OpenTelemetry
pip install -e ".[otel]"
# Run fast tests
pytest tests/ -k "not real_llm and not real_mcp and not podman" -q
# Lint
ruff check src/ tests/
Documentation
| Document | Purpose |
|---|---|
| docs/ARCHITECTURE.md | Layer design, component map, data flow, extension points |
| docs/DECISIONS.md | Architecture Decision Records (ADR-001 to ADR-011) |
| docs/VISION.md | Long-term vision, principles, use cases |
| docs/PLAN.md | Phases 1–8: deliverables and verification gates |
| docs/STATE.md | Current phase, CI status, resumability guide |
| docs/BACKLOG.md | Prioritised work items; what to do next |
| docs/CAPABILITIES.md | Capability model and routing rules |
| docs/MCP_INTEGRATIONS.md | MCP server setup (GitHub, Confluence, Jira, filesystem) |
| docs/BACKENDS.md | Using backends other than Ollama |
| REQUIREMENTS.md | MVP functional requirements and validation |
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentic_concierge-0.3.70.tar.gz.
File metadata
- Download URL: agentic_concierge-0.3.70.tar.gz
- Upload date:
- Size: 620.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a559c3baf417f7520e1f4e1917b70f0fb1344b83c499236faac27805bd68798
|
|
| MD5 |
bd829b296504b6868cdfce0ceb471dec
|
|
| BLAKE2b-256 |
8c5b13bc43233ee1f5170e3096c63846f1af19968b4ab604932da105c4d0e6f6
|
Provenance
The following attestation bundles were made for agentic_concierge-0.3.70.tar.gz:
Publisher:
release.yml on ausmarton/agentic-concierge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_concierge-0.3.70.tar.gz -
Subject digest:
9a559c3baf417f7520e1f4e1917b70f0fb1344b83c499236faac27805bd68798 - Sigstore transparency entry: 1081972185
- Sigstore integration time:
-
Permalink:
ausmarton/agentic-concierge@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70 -
Branch / Tag:
refs/tags/v0.3.70 - Owner: https://github.com/ausmarton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentic_concierge-0.3.70-py3-none-any.whl.
File metadata
- Download URL: agentic_concierge-0.3.70-py3-none-any.whl
- Upload date:
- Size: 234.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05d934dcadb3cff7086c46898090ed3410fe8ae11110c30cb4094838eb031374
|
|
| MD5 |
10bc5a5805ba9136569c4503b963d800
|
|
| BLAKE2b-256 |
c7194643b6b40d2ecd58e443060c262b39234c3d1698d6bd1689d14f83be474a
|
Provenance
The following attestation bundles were made for agentic_concierge-0.3.70-py3-none-any.whl:
Publisher:
release.yml on ausmarton/agentic-concierge
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentic_concierge-0.3.70-py3-none-any.whl -
Subject digest:
05d934dcadb3cff7086c46898090ed3410fe8ae11110c30cb4094838eb031374 - Sigstore transparency entry: 1081972270
- Sigstore integration time:
-
Permalink:
ausmarton/agentic-concierge@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70 -
Branch / Tag:
refs/tags/v0.3.70 - Owner: https://github.com/ausmarton
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70 -
Trigger Event:
push
-
Statement type: