An agentic concierge that spawns ephemeral specialist teams to tackle tasks — capability-based routing, multi-pack task forces, and native tool-calling on local or cloud LLM endpoints.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ausmarton

These details have not been verified by PyPI

Project description

agentic-concierge

A quality-first agent orchestration framework for local LLM inference.

Router + Supervisor decomposes tasks into capabilities, recruits the right specialist packs, and runs them.
Specialist packs are modular and composable: engineering, research, enterprise research — or add your own.
Local-first: Ollama is the default and primary backend; any OpenAI-compatible server works via config.
Extensible via MCP: connect GitHub, Confluence, Jira, filesystem, and other tool servers with a single config entry — no custom Python required.
Observable: structured runlogs, persistent cross-run index, real-time SSE streaming, OpenTelemetry traces.

Key features

Feature	Details
Specialist packs	Engineering (shell, file I/O, test, deploy-propose-only), Research (web search, fetch, citations), Enterprise Research (GitHub/Confluence/Jira via MCP, cross-run memory search)
Task decomposition	Prompt → capability IDs → recruit the right pack(s) automatically
Task forces	Multiple packs run sequentially (with context handoff) or in parallel (`asyncio.gather`) for a single task
MCP tool servers	stdio or SSE MCP servers attached per specialist via config; tools merged transparently
Cloud fallback	Local model tried first; cloud model used when local fails a quality bar (no tool calls, malformed args)
Podman isolation	Optional: wrap any pack with `ContainerisedSpecialistPack` by setting `container_image` in config
Semantic run index	Every run is indexed; past runs are searchable by keyword or embedding similarity (`concierge logs search`)
Real-time streaming	`POST /run/stream` streams all run events as Server-Sent Events
Run status	`GET /runs/{run_id}/status` returns `running` / `completed` without reading the full runlog
OpenTelemetry	Optional `[otel]` dep; `fabric.execute_task`, `fabric.llm_call`, `fabric.tool_call` spans

Installation

Quick install — Linux binary (recommended for end users)

curl -fsSL https://raw.githubusercontent.com/ausmarton/agentic-concierge/main/install.sh | sh

Downloads a static musl binary (~5 MB) to ~/.local/bin/concierge. Supports x86_64 and aarch64 Linux. No Python, pip, or package manager required.

On first run the launcher:

Detects or downloads Python 3.12 via uv
Creates a managed venv at ~/.local/share/agentic-concierge/venv/
Installs agentic-concierge from PyPI
Exec-replaces itself with the Python binary (correct PID, transparent signal forwarding)

Keep the launcher up to date:

concierge --self-update

Install to a custom directory (e.g. for system-wide install):

CONCIERGE_INSTALL_DIR=/usr/local/bin \
  curl -fsSL https://raw.githubusercontent.com/ausmarton/agentic-concierge/main/install.sh | sh

From PyPI (developers / non-Linux)

pip install agentic-concierge

Install optional extras:

pip install "agentic-concierge[otel]"   # OpenTelemetry tracing
pip install "agentic-concierge[mcp]"    # MCP tool server support

Docker (batteries-included: Ollama + agentic-concierge)

# Clone the repo for the config and docker-compose file
git clone https://github.com/ausmarton/agentic-concierge.git
cd agentic-concierge

# Start Ollama + agentic-concierge (pulls qwen2.5:7b on first run)
docker compose up -d

# Run a task
curl -X POST http://localhost:8080/run \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create a file hello.txt with content Hello World", "pack": "engineering"}'

The docker-compose.yml includes an Ollama service with a health check, an agentic-concierge service, and a one-shot model-pull service that exits after pulling qwen2.5:7b.

To use a different model, edit examples/ollama.json and re-mount it via CONCIERGE_CONFIG_PATH.

From source

git clone https://github.com/ausmarton/agentic-concierge.git
cd agentic-concierge
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

Quick start (local Ollama)

1. System dependencies

sudo dnf install -y python3 python3-devel gcc gcc-c++ make cmake git ripgrep jq

2. Install and start Ollama

# Install (pick one)
curl -fsSL https://ollama.com/install.sh | sh          # official script
# OR: sudo dnf install -y ollama                       # Fedora package

# Start (if not already running as a service)
ollama serve

Pull a model (agentic-concierge auto-pulls qwen2.5:7b if no chat model is found, but pre-pulling is faster):

ollama pull qwen2.5:7b     # fast model (default)
ollama pull qwen2.5:14b    # quality model (optional)

Any other model works — set CONCIERGE_CONFIG_PATH to point at a config with your preferred model name.

3. Install agentic-concierge

pip install agentic-concierge
# or from source:
# cd /path/to/agentic-concierge && pip install -e .

4. Run

# Quick smoke test — creates a file and lists the workspace
concierge run "Create a file hello.txt with content Hello World, then list the workspace." --pack engineering

Stream events as they happen with --stream (shows tool calls, LLM steps, results in real-time):

concierge run "Build a Flask /health endpoint with a test" --pack engineering --stream

You should see a run directory path and JSON with "action": "final". Check:

.concierge/runs/<run_id>/workspace/hello.txt — artifact
.concierge/runs/<run_id>/runlog.jsonl — structured event log (tool calls, LLM responses, etc.)

CLI reference

concierge run PROMPT [OPTIONS]

  Run a task using a specialist pack.

  Options:
    --pack TEXT              Specialist ID (e.g. engineering, research).
                             Omit to let the router pick based on capabilities.
    --model-key TEXT         Which model entry to use from config [default: quality]
    --network-allowed / --no-network-allowed
                             Allow web tools (web_search, fetch_url) [default: enabled]
    --stream / -s            Stream run events to the terminal as they happen.
    --verbose                Enable DEBUG logging

concierge serve [OPTIONS]

  Start the HTTP API server.

  Options:
    --host TEXT  [default: 127.0.0.1]
    --port INT   [default: 8787]

concierge logs list [OPTIONS]

  List past runs (most recent first).

  Options:
    --workspace PATH   [default: .concierge]
    --limit N          [default: 20]

concierge logs show RUN_ID [OPTIONS]

  Pretty-print runlog events for a run.

  Options:
    --workspace PATH
    --kinds TEXT   Comma-separated event kinds to filter
                   (e.g. tool_call,tool_result)

concierge logs search QUERY [OPTIONS]

  Search the cross-run index.
  Uses semantic similarity when embedding_model is configured;
  falls back to keyword/substring matching otherwise.

  Options:
    --workspace PATH
    --limit N          [default: 10]

HTTP API

Start the server:

concierge serve
# or: uvicorn agentic_concierge.interfaces.http_api:app --host 0.0.0.0 --port 8787

`GET /health`

{"ok": true}

`POST /run` — blocking run

curl -X POST http://127.0.0.1:8787/run \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create ok.txt with content OK", "pack": "engineering"}'

Request body:

{
  "prompt": "your task",
  "pack": "engineering",       // optional; omit to auto-route
  "model_key": "quality",      // optional; default "quality"
  "network_allowed": true      // optional; default true
}

Response: the finish_task payload merged with a _meta field containing run_id, specialist_ids, workspace, model, etc.

`POST /run/stream` — Server-Sent Events

Streams run events in real-time as they happen:

curl -N -X POST http://127.0.0.1:8787/run/stream \
  -H "Content-Type: application/json" \
  -d '{"prompt": "Create ok.txt with content OK", "pack": "engineering"}'

Each event is a data: <json>\n\n SSE line. Event kinds:

Kind	When
`recruitment`	Specialist(s) selected
`llm_request`	Before each LLM call
`llm_response`	After each LLM call
`tool_call`	Before each tool execution
`tool_result`	Successful tool result
`tool_error`	Tool raised an exception
`security_event`	Sandbox violation (path escape, disallowed command)
`cloud_fallback`	Local model fell back to cloud
`pack_start`	A specialist pack started (task forces)
`run_complete`	Run finished successfully
`_run_done_`	Terminal sentinel — stream ends
`_run_error_`	Terminal sentinel — run failed

Rate limiting

When CONCIERGE_RATE_LIMIT is set to a positive integer, the API enforces a per-IP sliding-window rate limit (requests per minute). GET /health is always exempt. Excess requests receive 429 Too Many Requests with a Retry-After header:

export CONCIERGE_RATE_LIMIT=60   # 60 requests per minute per IP (default: no limit)
concierge serve

API key authentication

When CONCIERGE_API_KEY is set, every endpoint except GET /health requires an Authorization: Bearer <key> header:

export CONCIERGE_API_KEY="your-strong-secret"
concierge serve

# Include the header in every request:
curl -X POST http://127.0.0.1:8787/run \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-strong-secret" \
  -d '{"prompt": "hello"}'

Leave CONCIERGE_API_KEY unset (default) to disable authentication — suitable for local use. Uses constant-time comparison (hmac.compare_digest) to prevent timing attacks.

`GET /runs/{run_id}/status`

curl http://127.0.0.1:8787/runs/abc123.../status

{"status": "completed", "run_id": "abc123...", "specialist_ids": ["engineering"], "task_force_mode": "sequential"}

Status values: running, completed. Returns HTTP 404 if the run ID is not found.

Configuration

Set CONCIERGE_CONFIG_PATH to a JSON or YAML file to override the defaults.

export CONCIERGE_CONFIG_PATH=/path/to/your/config.json

The default config uses Ollama at localhost:11434 with qwen2.5:7b (fast) and qwen2.5:14b (quality). Copy examples/ollama.json as a starting point.

Key config fields

{
  "models": {
    "fast":    {"base_url": "http://localhost:11434/v1", "model": "qwen2.5:7b",  "temperature": 0.1, "max_tokens": 1200},
    "quality": {"base_url": "http://localhost:11434/v1", "model": "qwen2.5:14b", "temperature": 0.1, "max_tokens": 2400}
  },
  "specialists": {
    "engineering": {
      "description": "Plan → implement → test → review → iterate.",
      "keywords":    ["build", "implement", "code", "python"],
      "workflow":    "engineering",
      "capabilities": ["code_execution", "file_io", "software_testing"]
    }
  },

  "routing_model_key": "fast",         // model used for LLM-based routing
  "task_force_mode": "sequential",     // "sequential" (default) or "parallel"

  "local_llm_ensure_available": true,  // start Ollama if unreachable
  "local_llm_start_cmd": ["ollama", "serve"],
  "auto_pull_if_missing": true,        // pull qwen2.5:7b when no model exists
  "auto_pull_model": "qwen2.5:7b",

  "run_index": {
    "embedding_model": "nomic-embed-text"   // enables semantic search; omit for keyword-only
  },

  "cloud_fallback": {
    "model_key": "cloud_quality",           // must exist in "models"
    "policy": "no_tool_calls"               // trigger: "no_tool_calls" | "malformed_args" | "always"
  },

  "telemetry": {
    "enabled": true,
    "exporter": "otlp",
    "otlp_endpoint": "http://localhost:4317"
  }
}

Using a non-Ollama backend

Any OpenAI-compatible endpoint works. Set backend: "generic" for cloud/vLLM/LiteLLM servers (skips Ollama-specific 400 retry logic):

"models": {
  "quality": {
    "base_url": "https://api.openai.com/v1",
    "model": "gpt-4o",
    "api_key": "sk-...",
    "backend": "generic"
  }
}

Set local_llm_ensure_available: false when you manage the server yourself (CI, cloud deployments, etc.).

MCP tool servers

Attach any MCP server to a specialist pack — no Python code required:

"specialists": {
  "engineering": {
    "description": "Engineering with GitHub access.",
    "workflow": "engineering",
    "capabilities": ["code_execution", "file_io", "github_search"],
    "mcp_servers": [
      {
        "name": "github",
        "transport": "stdio",
        "command": "npx",
        "args": ["--yes", "--", "@modelcontextprotocol/server-github"],
        "env": {"GITHUB_TOKEN": "${GITHUB_TOKEN}"}
      }
    ]
  }
}

Tools are auto-discovered at startup and prefixed mcp__github__<tool>. See docs/MCP_INTEGRATIONS.md for GitHub, Confluence, Jira, and filesystem examples.

Parallel task forces

Run multiple specialists concurrently for independent sub-tasks:

"task_force_mode": "parallel"

In sequential mode (default) each pack receives the previous pack's output as context. In parallel mode all packs run concurrently via asyncio.gather and results are merged.

Podman container isolation

"specialists": {
  "engineering": {
    "container_image": "python:3.12-slim"
  }
}

All shell tool calls execute inside an isolated Podman container with the workspace mounted at /workspace. Requires Podman installed and the image available locally.

Specialist packs

Built-in packs

ID	Description	Tools
`engineering`	Plan → implement → test → review	`shell`, `read_file`, `write_file`, `list_files`, `finish_task`
`research`	Scope → search → screen → extract → synthesize	`web_search`, `fetch_url`, `read_file`, `write_file`, `list_files`, `finish_task`
`enterprise_research`	GitHub/Confluence/Jira search + cross-run memory	All research tools + `cross_run_search` + any configured MCP tools

* Requires network_allowed: true (default).

Adding a custom pack

Option A — config-driven (no core change required):

# mypackage/packs.py
from agentic_concierge.infrastructure.specialists.base import BaseSpecialistPack
from agentic_concierge.infrastructure.specialists.tool_defs import make_tool_def, make_finish_tool_def

def build_my_pack(workspace_path: str, network_allowed: bool):
    tools = {
        "my_tool": lambda args: {"result": "..."},
    }
    tool_definitions = [
        make_tool_def("my_tool", "Does something useful.", {"type": "object", "properties": {...}, "required": [...]}),
        make_finish_tool_def(),
    ]
    return BaseSpecialistPack(
        specialist_id="my_specialist",
        system_prompt="You are a ...",
        tool_map=tools,
        tool_definitions=tool_definitions,
        workspace_path=workspace_path,
    )

"specialists": {
  "my_specialist": {
    "description": "My custom specialist.",
    "workflow":    "my_specialist",
    "builder":     "mypackage.packs:build_my_pack",
    "capabilities": ["my_capability"]
  }
}

Option B — built-in: add your pack factory to infrastructure/specialists/, register in _DEFAULT_BUILDERS in registry.py, and add an entry to DEFAULT_CONFIG. See docs/ARCHITECTURE.md §5 for the full extension guide.

Runlog

Every run produces .concierge/runs/<run_id>/runlog.jsonl. Each line:

{"ts": 1708800000.123, "kind": "tool_call", "step": "step_0", "payload": {"tool": "shell", "args": {"cmd": "ls"}}}

Inspect with:

concierge logs show <run_id>
concierge logs show <run_id> --kinds tool_call,tool_result

Testing

Fast CI (no LLM required, ~4 seconds, 1377+ tests):

pip install -e ".[dev]"
pytest tests/ -k "not real_llm and not real_mcp and not podman" -q

Full validation (requires Ollama + a pulled model):

python scripts/validate_full.py

Ensures the LLM is reachable (starts it if needed via config), then runs all tests including real-LLM E2E tests. Use ollama pull qwen2.5:7b or set CONCIERGE_CONFIG_PATH to a config with a model you have.

Single E2E check:

python scripts/verify_working_real.py

Runs one engineering task end-to-end and asserts that tool_call/tool_result events exist and workspace artifacts are created.

Test markers:

Marker	Meaning
`real_llm`	Requires a live Ollama instance
`real_mcp`	Requires `npx` and an MCP server package
`podman`	Requires Podman and a pulled container image

Development

See CONTRIBUTING.md for the full contributor guide.

# Install dev dependencies (includes mcp, pytest, pytest-asyncio)
pip install -e ".[dev]"

# Optional: OpenTelemetry
pip install -e ".[otel]"

# Run fast tests
pytest tests/ -k "not real_llm and not real_mcp and not podman" -q

# Lint
ruff check src/ tests/

Documentation

Document	Purpose
docs/ARCHITECTURE.md	Layer design, component map, data flow, extension points
docs/DECISIONS.md	Architecture Decision Records (ADR-001 to ADR-011)
docs/VISION.md	Long-term vision, principles, use cases
docs/PLAN.md	Phases 1–8: deliverables and verification gates
docs/STATE.md	Current phase, CI status, resumability guide
docs/BACKLOG.md	Prioritised work items; what to do next
docs/CAPABILITIES.md	Capability model and routing rules
docs/MCP_INTEGRATIONS.md	MCP server setup (GitHub, Confluence, Jira, filesystem)
docs/BACKENDS.md	Using backends other than Ollama
REQUIREMENTS.md	MVP functional requirements and validation

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ausmarton

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.70

Mar 11, 2026

0.3.69

Mar 11, 2026

0.3.68

Mar 11, 2026

0.3.67

Mar 11, 2026

0.3.66

Mar 11, 2026

0.3.65

Mar 11, 2026

0.3.64

Mar 11, 2026

0.3.63

Mar 11, 2026

0.3.62

Mar 11, 2026

0.3.61

Mar 11, 2026

0.3.60

Mar 11, 2026

0.3.59

Mar 10, 2026

0.3.58

Mar 10, 2026

0.3.57

Mar 10, 2026

0.3.55

Mar 10, 2026

0.3.54

Mar 10, 2026

0.3.53

Mar 10, 2026

0.3.52

Mar 10, 2026

0.3.35

Mar 9, 2026

0.3.34

Mar 9, 2026

0.3.33

Mar 9, 2026

0.3.32

Mar 8, 2026

0.3.31

Mar 8, 2026

0.3.30

Mar 8, 2026

0.3.29

Mar 8, 2026

0.3.28

Mar 7, 2026

0.3.27

Mar 7, 2026

0.3.26

Mar 7, 2026

0.3.25

Mar 7, 2026

0.3.24

Mar 7, 2026

0.3.23

Mar 7, 2026

0.3.22

Mar 7, 2026

0.3.21

Mar 6, 2026

0.3.20

Mar 6, 2026

0.3.19

Mar 6, 2026

0.3.18

Mar 6, 2026

0.3.17

Mar 6, 2026

0.3.16

Mar 5, 2026

0.3.15

Mar 5, 2026

0.3.14

Mar 4, 2026

0.3.13

Mar 4, 2026

0.3.12

Mar 4, 2026

0.3.11

Mar 4, 2026

0.3.10

Mar 4, 2026

0.3.9

Feb 27, 2026

0.3.8

Feb 27, 2026

0.3.7

Feb 27, 2026

0.3.6

Feb 27, 2026

0.3.5

Feb 27, 2026

0.3.4

Feb 27, 2026

0.3.3

Feb 27, 2026

0.3.2

Feb 27, 2026

0.3.1

Feb 26, 2026

0.3.0

Feb 26, 2026

0.2.0

Feb 26, 2026

0.1.1

Feb 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentic_concierge-0.3.70.tar.gz (620.1 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentic_concierge-0.3.70-py3-none-any.whl (234.3 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file agentic_concierge-0.3.70.tar.gz.

File metadata

Download URL: agentic_concierge-0.3.70.tar.gz
Upload date: Mar 11, 2026
Size: 620.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_concierge-0.3.70.tar.gz
Algorithm	Hash digest
SHA256	`9a559c3baf417f7520e1f4e1917b70f0fb1344b83c499236faac27805bd68798`
MD5	`bd829b296504b6868cdfce0ceb471dec`
BLAKE2b-256	`8c5b13bc43233ee1f5170e3096c63846f1af19968b4ab604932da105c4d0e6f6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_concierge-0.3.70.tar.gz:

Publisher: release.yml on ausmarton/agentic-concierge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_concierge-0.3.70.tar.gz
- Subject digest: 9a559c3baf417f7520e1f4e1917b70f0fb1344b83c499236faac27805bd68798
- Sigstore transparency entry: 1081972185
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: ausmarton/agentic-concierge@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70
- Branch / Tag: refs/tags/v0.3.70
- Owner: https://github.com/ausmarton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70
- Trigger Event: push

File details

Details for the file agentic_concierge-0.3.70-py3-none-any.whl.

File metadata

Download URL: agentic_concierge-0.3.70-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 234.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for agentic_concierge-0.3.70-py3-none-any.whl
Algorithm	Hash digest
SHA256	`05d934dcadb3cff7086c46898090ed3410fe8ae11110c30cb4094838eb031374`
MD5	`10bc5a5805ba9136569c4503b963d800`
BLAKE2b-256	`c7194643b6b40d2ecd58e443060c262b39234c3d1698d6bd1689d14f83be474a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for agentic_concierge-0.3.70-py3-none-any.whl:

Publisher: release.yml on ausmarton/agentic-concierge

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: agentic_concierge-0.3.70-py3-none-any.whl
- Subject digest: 05d934dcadb3cff7086c46898090ed3410fe8ae11110c30cb4094838eb031374
- Sigstore transparency entry: 1081972270
- Sigstore integration time: Mar 11, 2026
Source repository:
- Permalink: ausmarton/agentic-concierge@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70
- Branch / Tag: refs/tags/v0.3.70
- Owner: https://github.com/ausmarton
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fc1f1d3dd93ae8f56604d0beefe5e88337d3fd70
- Trigger Event: push

agentic-concierge 0.3.70

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

agentic-concierge

Key features

Installation

Quick install — Linux binary (recommended for end users)

From PyPI (developers / non-Linux)

Docker (batteries-included: Ollama + agentic-concierge)

From source

Quick start (local Ollama)

1. System dependencies

2. Install and start Ollama

3. Install agentic-concierge

4. Run

CLI reference

HTTP API

GET /health

POST /run — blocking run

POST /run/stream — Server-Sent Events

Rate limiting

API key authentication

GET /runs/{run_id}/status

Configuration

Key config fields

Using a non-Ollama backend

MCP tool servers

Parallel task forces

Podman container isolation

Specialist packs

Built-in packs

Adding a custom pack

Runlog

Testing

Development

Documentation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`GET /health`

`POST /run` — blocking run

`POST /run/stream` — Server-Sent Events

`GET /runs/{run_id}/status`