Skip to main content

Claude Code runner and orchestrator — thin job lifecycle, repo management, and OTEL pipeline

Project description

Agenticore

Two modes, one binary. Run a fleet of Claude Code agents that clone repos and ship PRs — or expose any customized Claude Code agent as a real-time, OpenAI-compatible chat completion endpoint with token-by-token thinking and tool deltas. Flip between modes with one environment variable.

License: MIT Tests Docker Helm PyPI Python 3.12+

                          ┌─── AGENT_MODE=false (default) ────────────┐
                          │  FLEET MODE — Orchestrator                 │
                          │  Submit a task, get a PR                   │
                          │                                            │
   MCP / REST / CLI ─────►│  clone repo ──► bespoke worktree           │
                          │       │              │                     │
                          │       └──► claude -p "<task>" ──► auto-PR  │
                          │                              └──► OTEL     │
                          │  KEDA-scaled fleet • work-stealing queue   │
   ┌─────────────┐        └────────────────────────────────────────────┘
   │ agenticore  │
   │   binary    │
   └─────────────┘        ┌─── AGENT_MODE=true ────────────────────────┐
                          │  AGENT MODE — Customized agent endpoint    │
                          │  Drop-in OpenAI chat completion server     │
                          │                                            │
   OpenAI-compatible ────►│  load agent package (system prompt, MCP    │
   chat clients           │    servers, hooks, skills, identity)       │
   (LibreChat,            │                                            │
    OpenWebUI,            │  POST /v1/chat/completions stream=true     │
    LiteLLM,              │       │                                    │
    custom UI,            │       └─► live SSE deltas:                 │
    raw curl -N)          │            thinking_delta (token-by-token) │
                          │            tool_use + tool_result          │
                          │            assistant text                  │
                          │                                            │
                          │  Sticky slash toggles per agent            │
                          │  Fully auditable — wire/disk/Redis layers  │
                          └────────────────────────────────────────────┘

Pick a mode

Fleet mode (default) Agent mode (AGENT_MODE=true)
What it does Accepts coding tasks, clones repos, runs Claude Code in bespoke worktrees, opens PRs Loads a pre-configured Claude Code agent package and exposes it as a chat completion endpoint
API surface /jobs REST · run_task MCP tool · agenticore run CLI /v1/chat/completions — fully OpenAI-compatible, streaming and non-streaming
Lifecycle Per-job clone + worktree, discarded after PR Long-lived agent identity loaded once at container startup
Scaling KEDA on Redis queue depth — N pods steal jobs from one queue One StatefulSet per agent identity; scale horizontally per agent
Output A pull request, an OTEL trace, a job result in Redis Live SSE deltas as chat.completion.chunk JSON, full transcript on disk
Drop-in for CI/CD pipelines, MCP-aware editors, internal "fix this" bots LibreChat, OpenWebUI, LiteLLM model routing, any OpenAI SDK client
Best for "We use Claude Code to refactor / fix / generate PRs across many repos" "We want our chat clients to talk to a customized Claude agent over the OpenAI protocol"

Both modes share the same binary, the same Docker image, the same Helm chart, the same profile system, the same Redis+file fallback, and the same OTEL trace pipeline. You don't pick at install time. You pick at runtime with one environment variable.


Why agenticore

You have Claude Code. You want it to do work for you programmatically. You have two shapes the work tends to take:

  1. Headless coding tasks across repos — "fix the auth bug", "add tests for the parser", "refactor this module". You want a fleet that accepts these, clones the right repo, runs Claude in a clean worktree, and opens a PR. → Fleet mode.

  2. A customized Claude agent your other tools can talk to — a personal assistant, a domain expert, a finops bot, a docs writer — exposed as an OpenAI-compatible endpoint so LibreChat, OpenWebUI, your LiteLLM router, or any OpenAI SDK client can drop it in as a "model". With real-time streaming of the agent's thinking, tool calls, and answers — not buffered, not batched, not faked. → Agent mode.

Agenticore is one binary that does both. Profiles, hooks, MCP whitelists, Redis state, OTEL traces, Helm chart — all shared between the two modes. Your operations team learns one thing.


🟦 FLEET MODE

Submit a task, get a PR. The original positioning.

MCP Client / REST Client / CLI
            │
            ▼
    ┌── Agenticore (Fleet Mode) ─────────────────────────────────┐
    │   Auth · Router · Job Queue                                │
    │                                                            │
    │   Clone repo ──► Bespoke worktree ──► claude -p "task"     │
    │   (cached)       (locked branch)      (cwd = worktree)     │
    │                                         │                  │
    │                                         ▼                  │
    │                                   Auto-PR (gh)             │
    │                                   Job result → Redis       │
    └──────────────────────┬─────────────────────────────────────┘
                           │
                    OTEL Collector
                    → Langfuse / PostgreSQL
  • Accepts tasks from MCP clients, REST, or CLI — same API surface, one port
  • Clones and caches repos, serializes concurrent access with distributed locks
  • Creates bespoke worktrees — locked before Claude starts, deterministic branch names
  • Applies execution profiles — installed into ~/.claude/ at startup via agentihooks
  • Spawns claude -p "<task>" in the worktree and opens a PR when it succeeds
  • Ships full OTEL traces (prompts, tool calls, token counts) to Langfuse / PostgreSQL
  • KEDA autoscaling on Redis queue depth + graceful drain on pod shutdown

Quickstart

# Set credentials
export ANTHROPIC_AUTH_TOKEN=sk-ant-...
export GITHUB_TOKEN=ghp_...

# Start the server
agenticore serve

# Submit a task and wait for the PR URL
agenticore run "fix the null pointer in auth.py" \
  --repo https://github.com/org/repo \
  --wait

REST

# Submit a job (async — returns immediately with job ID)
curl -X POST http://localhost:8200/jobs \
  -H "Content-Type: application/json" \
  -d '{"task":"fix the auth bug","repo_url":"https://github.com/org/repo"}'

# Submit and wait
curl -X POST http://localhost:8200/jobs \
  -H "Content-Type: application/json" \
  -d '{"task":"fix the auth bug","repo_url":"https://github.com/org/repo","wait":true}'

# Inspect
curl http://localhost:8200/jobs/{job_id}
curl "http://localhost:8200/jobs?limit=10&status=running"
curl -X DELETE http://localhost:8200/jobs/{job_id}

MCP tools (fleet mode)

Tool Description
run_task Submit a task for Claude Code execution
get_job Get status, output, and PR URL for a job
list_jobs List recent jobs
cancel_job Cancel a running or queued job
list_profiles List available execution profiles
plan_task Create a read-only implementation plan
execute_plan Execute a ready plan as a coding job
list_worktrees List all worktrees with age, size, branch, push status
cleanup_worktrees Remove specific worktrees (unlock + delete)

Connect any MCP client at http://localhost:8200/mcp (Streamable HTTP) or /sse (legacy SSE).


🟩 AGENT MODE

One environment variable. Now you have a customized Claude agent talking the OpenAI protocol with real-time thinking + tool streaming.

        AGENT_MODE=true + AGENT_MODE_PACKAGE_DIR=./my-agent-package
                                        │
                                        ▼
  ┌── Agenticore (Agent Mode) ──────────────────────────────────────┐
  │                                                                 │
  │   Load package once at startup:                                 │
  │     ├─ system.md (identity, instructions)                       │
  │     ├─ .claude/ (settings, hooks, skills, agents)               │
  │     └─ .mcp.json (tool servers this agent can call)             │
  │                                                                 │
  │   POST /v1/chat/completions stream=true                         │
  │     │                                                           │
  │     ├─ strip slash tokens (server-side, deterministic)          │
  │     ├─ load sticky visibility config from Redis                 │
  │     ├─ spawn claude --output-format stream-json                 │
  │     │                  --include-partial-messages               │
  │     ├─ read claude stdout line-by-line                          │
  │     │     thinking_delta  → delta.reasoning_content (live)      │
  │     │     text_delta      → delta.content (live)                │
  │     │     tool_use_block  → ```tool_use:NAME fenced block       │
  │     │     tool_result     → ```tool_result fenced block         │
  │     └─ flush each chunk to the open HTTP connection             │
  │                                                                 │
  └─────────────────────────────────────────────────────────────────┘

Drop-in for any OpenAI-compatible client. Because the endpoint speaks /v1/chat/completions and emits standard chat.completion.chunk JSON over SSE, you can register an agenticore-backed agent as an "OpenAI custom model" inside:

  • LibreChat — add as a custom OpenAI endpoint, pick from the model dropdown
  • OpenWebUI — same pattern
  • LiteLLM — register as openai/<agent> with api_base=http://<agent>:8200/v1, then route any LiteLLM client at it
  • OpenAI SDK (Python, JS, Go, Rust) — OpenAI(base_url="http://<agent>:8200/v1") and call chat.completions.create(...) exactly like you would against api.openai.com
  • curl -N — raw SSE works fine

Killer features

  • Real-time SSE streaming, fully auditable, fully traceable. Thinking blocks stream token-by-token as the model generates them. Tool calls and results stream live as the agent invokes them. Assistant text streams progressively. Nothing is buffered to the end of the turn. The streaming hot path reads claude's stdout directly via --output-format stream-json --verbose --include-partial-messages — no transcript polling, no Redis indirection, no JSONL flush race.
  • Thinking renders in delta.reasoning_content — separate reasoning panel in reasoning-aware clients (LibreChat, OpenWebUI), with x_agenticore_event_type="thinking" for custom clients that want explicit tagging.
  • Tool calls render as fenced markdown blocks```tool_use:NAME paired with ```tool_result below it. Deliberately not OpenAI's delta.tool_calls schema, which would make chat clients try to client-execute the tool and fail with "Tool not found".
  • Sticky per-agent visibility toggles intercepted server-side, before claude ever sees the prompt:
    • /show-thinking / /hide-thinking
    • /show-tools / /hide-tools
    • /show-all / /hide-all
    • /stream-status (returns the current config inline as a meta SSE event)
  • Multi-turn aware — toggle detection runs against the last user message, not the flattened history, so slash commands work on turn 2+. Toggle-only requests (e.g. just /show-all) return inline status without spawning claude — zero token cost.
  • Three-layer observation — every visible event reaches (1) the client over the wire, (2) claude's transcript JSONL on disk, and (3) optionally the Redis bus (non-streaming path) for cross-process subscribers. Cross-validate all three with tests/smoke/verify_streaming_pipeline.sh <agent>.
  • Async completion queue for fire-and-forget — wait=false pushes to Redis, a worker picks it up, poll GET /completions/{uuid}.
  • Session continuity — resume a conversation across requests via the external correlation UUID.
  • Redis+file fallback — works without Redis (inline execution, file-based state).

Quickstart

# Start the server in agent mode pointing at your agent package
AGENT_MODE=true \
AGENT_MODE_PACKAGE_DIR=./my-agent-package \
AGENTICORE_TRANSPORT=sse \
agenticore serve

# Toggle visibility once (sticky per agent — persists in Redis)
curl -sN http://localhost:8200/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"sonnet","stream":true,"messages":[{"role":"user","content":"/show-all"}]}'

# Now have a real conversation — watch thinking tokens + tool calls stream live
curl -sN http://localhost:8200/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"sonnet","stream":true,"messages":[
        {"role":"user","content":"is 17077 prime? think hard, then list any files in /tmp"}
      ]}'

# Non-streaming JSON (no slash tokens needed)
curl -X POST http://localhost:8200/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"sonnet","messages":[{"role":"user","content":"hello"}]}'

Drop into LibreChat

# librechat.yaml
endpoints:
  custom:
    - name: "Agenticore Agents"
      apiKey: "${LITELLM_API_KEY}"
      baseURL: "http://litellm.your-cluster.svc:4000/v1"
      models:
        fetch: true
      titleConvo: true

Register the agent in LiteLLM as a model pointing at the agenticore pod:

# Via LiteLLM admin (or the litellm_tools MCP)
model_name: my-agent
litellm_params:
  model: openai/my-agent
  api_base: http://my-agent.namespace.svc:8200/v1

Now my-agent shows up in LibreChat's model picker. Token-by-token thinking renders in the reasoning panel. Tool calls stream live as fenced markdown blocks.

Drop into the OpenAI SDK

from openai import OpenAI

client = OpenAI(base_url="http://my-agent.namespace.svc:8200/v1", api_key="n/a")

stream = client.chat.completions.create(
    model="sonnet",
    stream=True,
    messages=[
        {"role": "user", "content": "/show-all explain how an OS scheduler works step by step"},
    ],
)
for chunk in stream:
    delta = chunk.choices[0].delta
    if reasoning := getattr(delta, "reasoning_content", None) or delta.model_dump().get("reasoning_content"):
        print(f"[think] {reasoning}", end="", flush=True)
    elif delta.content:
        print(delta.content, end="", flush=True)

Full reference: SSE Streaming docs · Self-test walkthrough · Agent Mode architecture


Shared infrastructure (both modes)

Everything below applies to both Fleet mode and Agent mode. Same Docker image, same Helm chart, same env vars, same Redis schema.

Install

pip install agenticore

Or from source:

git clone https://github.com/The-Cloud-Clock-Work/agenticore.git
cd agenticore
pip install -e .

Profiles

Profiles are directory packages that configure how Claude Code runs. Each profile is a self-contained .claude/ tree installed into ~/.claude/ at container startup by agentihooks global. Claude Code reads from ~/.claude/ by default.

<profiles-dir>/{name}/
├── profile.yml          ← Agenticore metadata (model, turns, auto_pr, timeout…)
├── .claude/
│   ├── settings.json    ← Hooks, tool permissions, env vars
│   ├── CLAUDE.md        ← System instructions for Claude
│   ├── agents/          ← Custom subagents
│   └── skills/          ← Custom slash-command skills
└── .mcp.json            ← MCP server config merged into the job

Profiles support inheritance via extends: and live in {AGENTICORE_AGENTIHOOKS_PATH}/profiles/ or ~/.agenticore/profiles/. Full reference: Profile System docs.

Helm (Kubernetes)

Production-ready Helm chart published to GHCR. Deploys a StatefulSet with a shared RWX PVC (NFS / EFS / Azure Files / Ceph) so all pods share the same repo cache and job state, with KEDA autoscaling on Redis queue depth and graceful drain on pod shutdown.

Internet ──► LoadBalancer :8200
                    │
     ┌──────────────▼──────────────────────────┐
     │  Agenticore StatefulSet (0..N pods)     │
     │  Work-stealing from Redis queue         │
     └──────────┬──────────────────────────────┘
                │                │
         ┌──────▼───────┐  ┌─────▼───────────┐
         │  Redis       │  │  Shared RWX PVC │
         │  jobs · locks│  │  /shared/       │
         │  KEDA queue  │  │  ├─ repos/      │
         └──────────────┘  │  ├─ jobs/       │
                           │  └─ job-state/  │
         KEDA ScaledObject └─────────────────┘
         watches Redis queue
# Create the secret
kubectl create secret generic agenticore-secrets \
  --from-literal=redis-url="redis://:password@redis:6379" \
  --from-literal=anthropic-api-key="sk-ant-..." \
  --from-literal=github-token="ghp_..."

# Install (fleet mode)
helm install agenticore \
  oci://ghcr.io/the-cloud-clock-work/charts/agenticore \
  --set storage.className=your-rwx-storage-class

# Install (agent mode)
helm install my-agent \
  oci://ghcr.io/the-cloud-clock-work/charts/agenticore \
  --set storage.className=your-rwx-storage-class \
  --set agentMode.enabled=true \
  --set agentMode.agentName=my-agent

Full Kubernetes guide: Kubernetes Deployment.

Docker

# Local dev — full stack (Agenticore + Redis + PostgreSQL + OTEL Collector)
cp .env.example .env
docker compose up --build -d

# Production (fleet mode) — Agenticore only
docker run -d -p 8200:8200 \
  -e AGENTICORE_TRANSPORT=sse \
  -e ANTHROPIC_AUTH_TOKEN=sk-ant-... \
  -e REDIS_URL=redis://your-redis:6379/0 \
  -e GITHUB_TOKEN=ghp_... \
  tccw/agenticore

# Production (agent mode)
docker run -d -p 8200:8200 \
  -e AGENT_MODE=true \
  -e AGENTIHUB_AGENT=my-agent \
  -e AGENTICORE_TRANSPORT=sse \
  -e ANTHROPIC_AUTH_TOKEN=sk-ant-... \
  -e REDIS_URL=redis://your-redis:6379/0 \
  tccw/agenticore

Authentication

Authentication is optional. When disabled, all endpoints are public.

# API keys — comma-separated for multiple
AGENTICORE_API_KEYS="key-1,key-2" agenticore serve

Pass the key via X-Api-Key header, ?api_key=... query param, or Authorization: Bearer .... The /health endpoint is always public.

Claude credentials resolved in order: CLAUDE_CODE_OAUTH_TOKENANTHROPIC_AUTH_TOKEN + ANTHROPIC_BASE_URL. GitHub credentials: GitHub App (GITHUB_APP_ID + key + installation ID) → static GITHUB_TOKEN → none (public repos only).

OTEL Observability

Every job (fleet mode) and every completion (agent mode) produces a Langfuse trace with spans for each Claude turn including prompts, tool calls, and token counts.

LANGFUSE_PUBLIC_KEY=pk-lf-...
LANGFUSE_SECRET_KEY=sk-lf-...
LANGFUSE_HOST=https://cloud.langfuse.com
AGENTICORE_OTEL_ENABLED=true
OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector:4317

The bundled docker-compose.yml includes an OTEL Collector pre-wired to push traces to Langfuse and PostgreSQL. Full setup: OTEL Pipeline docs.

Key environment variables

Variable Default Description
AGENT_MODE false The mode switch. true enables agent mode
AGENT_MODE_PACKAGE_DIR (empty) Path to the agent package (agent mode only)
AGENTIHUB_AGENT (empty) Agent name to load from agentihub (agent mode)
AGENTICORE_TRANSPORT stdio sse for HTTP server, stdio for MCP pipe
AGENTICORE_HOST 127.0.0.1 Bind address
AGENTICORE_PORT 8200 Server port
AGENTICORE_API_KEYS (empty) Comma-separated API keys (optional)
ANTHROPIC_AUTH_TOKEN (empty) Anthropic API key (or use CLAUDE_CODE_OAUTH_TOKEN)
REDIS_URL (empty) Redis URL — omit for file-based fallback
GITHUB_TOKEN (empty) GitHub token for auto-PR (fleet mode)
AGENTIHOOKS_PROFILE coding Active profile (fleet mode)
AGENTICORE_CLAUDE_TIMEOUT 3600 Max claude runtime in seconds
AGENTICORE_AGENTIHOOKS_URL (empty) Git URL to clone agentihooks from
AGENTICORE_AGENTIHOOKS_BUNDLE_URL (empty) Git URL to clone the bundle
AGENTICORE_AGENTIHUB_URL (empty) Git URL for agentihub repo (agent mode)
AGENTICORE_SHARED_FS_ROOT (empty) Shared FS root (Kubernetes mode)

Full reference: Configuration docs.

CLI commands

Command Description
agenticore serve Start the server (fleet or agent mode based on env)
agenticore run "<task>" --repo <url> [--wait] Submit a task (fleet mode)
agenticore jobs / agenticore job <id> List / inspect jobs
agenticore cancel <id> Cancel a running job
agenticore profiles List execution profiles
agenticore agents Interactive TUI — K8s pods + local agent packages
agenticore agents --headless <action> Headless: list, chat, job, sync, health, local
agenticore hooks sync [--target T] Clone/fetch profile sources
agenticore agent --compose-up Bring up the local dev stack
agenticore drain Drain pod before shutdown (Kubernetes)
agenticore status / version / update Server health, version, self-update

Full CLI reference: CLI Commands.


Documentation

Get started

Architecture

Deployment

Reference


Development

pip install -e ".[dev]"

# Tests
pytest tests/unit -v -m unit --cov=agenticore

# Lint
ruff check agenticore/ tests/
ruff format --check agenticore/ tests/

PRs welcome. The feat/* branches in the repo show recent work — the most recent landed feature is the token-by-token SSE streaming layer (feat/stream-json-directdevmain at f440e3c, released as v1.3.0).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agenticore-1.5.0.tar.gz (116.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agenticore-1.5.0-py3-none-any.whl (124.9 kB view details)

Uploaded Python 3

File details

Details for the file agenticore-1.5.0.tar.gz.

File metadata

  • Download URL: agenticore-1.5.0.tar.gz
  • Upload date:
  • Size: 116.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenticore-1.5.0.tar.gz
Algorithm Hash digest
SHA256 49e6f3535edf957ad1209d4121600f17c4b91c6bb1d22c18190f85797e40316e
MD5 4622a87de1b79ebffbae769cd9d4db95
BLAKE2b-256 a83326c24cfe888d47a18b4b85f0c4bd9dde8a03c40ebb97f36365aac9260a3c

See more details on using hashes here.

Provenance

The following attestation bundles were made for agenticore-1.5.0.tar.gz:

Publisher: publish-pypi.yml on The-Cloud-Clock-Work/agenticore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file agenticore-1.5.0-py3-none-any.whl.

File metadata

  • Download URL: agenticore-1.5.0-py3-none-any.whl
  • Upload date:
  • Size: 124.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for agenticore-1.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1bd48d1e5a68ae19c430e1f428352fc44759570ffb749a98274ebdfdb4e611e3
MD5 0935bb4829ec27b5ea8699e90ac8edab
BLAKE2b-256 218c7f0a3330a3885bdf8017452140b4c265977dfbd18ba4dc38a8257e1ee3f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for agenticore-1.5.0-py3-none-any.whl:

Publisher: publish-pypi.yml on The-Cloud-Clock-Work/agenticore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page