Skip to main content

A code-native agentic framework for building robust AI agents.

Project description

CodePilot — Developer Reference

CodePilot is a code-native agentic framework for Python. The LLM writes executable code to act — no JSON schemas, no function-calling APIs, no tool wrappers. This document covers every feature with working code examples.

Version: 0.8.0

Linux only. Both the shell tools (execute, read_output, send_input, send_signal, kill_shell) and semantic_search require Linux. They rely on pexpect and grepai — deploy your agent in a Linux container.

Docker tip: Pre-install grepai and ripgrep in your image:

RUN curl -sSL https://raw.githubusercontent.com/yoanbernabeu/grepai/main/install.sh | sh
RUN apt-get install -y ripgrep

Installation

pip install codepilot-ai

Set your LLM provider key before running anything:

# Pick one
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENAI_API_KEY="sk-..."
export DASHSCOPE_API_KEY="..."

Table of Contents

  1. How it works
  2. AgentFile (YAML config)
  3. Basic usage
  4. Streaming
  5. Multi-turn execution
  6. Session persistence
  7. Context memory management
  8. Resuming a session
  9. Resetting a session
  10. Hooks — full observability
  11. Permission gating
  12. Mid-task message injection
  13. Multi-operation steps
  14. Shell tools
  15. Completion block
  16. Workspace change detection
  17. Chat mode
  18. Custom tools
  19. Aborting the agent
  20. Building a CLI tool
  21. Building a web server integration
  22. Full API surface

1. How It Works

CodePilot uses a code-as-interface paradigm. Instead of the LLM describing actions in JSON, it writes Python code that the runtime executes directly.

Each agent step:

  1. LLM receives the system prompt (refreshed every step) + full conversation history
  2. LLM writes a natural language reasoning paragraph (streamed to user in real time), then a ```codepilot block (Python code)
  3. Runtime executes the code block in a sandboxed environment with bound tool functions
  4. Execution result is appended to conversation history as [EXECUTION RESULT]
  5. Repeat until the agent emits a ```completion block, hits max_steps, or is aborted

The three block types

Control Block (```codepilot) — the only block the runtime executes. Regular ```python blocks are display-only markdown the agent uses freely in explanations.

Payload Blocks (```python, ```js, etc. after a codepilot block) — file content consumed by write_file() in order. Never executed.

Completion Block (```completion) — natural text that streams directly to the user in real time. Its presence marks the task complete — the agentic loop terminates after this step. Can be combined with the codepilot block and payload blocks in a single agentic step.

Response shapes

Action step (more work needed):

Alright, let me read the file first to get the line numbers.

```codepilot
# Reading before editing — exact line numbers required.
read_file("routes/profile.py", start_line=35, end_line=65)
```

Single-step task (action + completion in one step):

Got it — updating the timeout value.

```codepilot
# Simple single-line edit, no read needed — we know the line.
write_file("config.py", start_line=12, end_line=12, mode="edit")
```

```python
TIMEOUT = 30
```

```completion
Done. Updated TIMEOUT to 30s in config.py on line 12.
```

Chat/explanation (no execution, entire response streams):

Sure! Here's how the config loader handles missing files:

```python
# Display block — never executed
def load(path: str) -> dict:
    if not os.path.exists(path):
        return {}   # returns empty dict as default
    with open(path) as f:
        return json.load(f)
```

The fallback is an empty dict, so callers always get a valid dict — no None checks needed.

2. AgentFile

Every Runtime is driven by a YAML config. Paths are resolved relative to the YAML file's location — not the caller's CWD.

# agent.yaml
agent:
  name: "BackendEngineer"
  role: "Expert Python backend engineer specialising in FastAPI and PostgreSQL."

  # Either a raw string or a path to a .md file (resolved relative to this YAML)
  system_prompt: "./prompts/instructions.md"

  model:
    provider: "alibaba"             # "anthropic" | "openai" | "alibaba"
    name: "qwen-max"
    api_key_env: "DASHSCOPE_API_KEY"
    temperature: 0.2
    max_tokens: 8096
    thinking:                       # Anthropic only: extended reasoning
      enabled: false
      budget_tokens: 8000

  runtime:
    work_dir: "./workspace"         # where the agent reads/writes files
    max_steps: 30                   # hard cap on agentic steps per run()
    unsafe_mode: false              # true = allow writes outside work_dir
    allowed_imports:                # stdlib modules allowed in the control block
      - "re"
      - "json"
      - "math"
      - "datetime"
      - "pathlib"

  tools:
    - name: "write_file"
      enabled: true
      config:
        require_permission: false   # true = ask user before every file write

    - name: "read_file"
      enabled: true

    - name: "execute"
      enabled: true
      config:
        require_permission: true    # true = ask user before every shell command
        max_output_chars: 10000     # truncate long command output

    - name: "read_output"
      enabled: true

    - name: "send_input"
      enabled: true

    - name: "send_signal"
      enabled: true

    - name: "kill_shell"
      enabled: true

    - name: "ask_user"
      enabled: true

    - name: "find"
      enabled: true

    - name: "semantic_search"
      enabled: true
      config:
        # VoyageAI API key env var — REQUIRED for semantic search to work.
        # Get a free key at https://www.voyageai.com/
        api_key_env: "VOYAGE_API_KEY"

        # Embedding model — voyage-code-3 is purpose-built for code search
        model: "voyage-code-3"

        # VoyageAI uses an OpenAI-compatible API — this is the default endpoint
        base_url: "https://api.voyageai.com/v1"

        # Provider name passed to grepai internals (leave as "openai" —
        # it's the protocol name, not the vendor)
        provider: "openai"

        # Maximum results returned per search (default: 5)
        top_k: 5

        # Max seconds to wait for a grepai command (default: 60)
        timeout: 60

        # Truncate output to prevent context overflow (default: 8000 chars)
        max_output_chars: 8000

Supported providers:

provider name examples api_key_env
anthropic claude-opus-4-5, claude-sonnet-4-5 ANTHROPIC_API_KEY
openai gpt-4o, gpt-4-turbo OPENAI_API_KEY
alibaba qwen-max, qwen-plus, qwen-turbo DASHSCOPE_API_KEY

3. Basic Usage

from codepilot import Runtime

runtime = Runtime("agent.yaml")
summary = runtime.run("Create a FastAPI hello-world server in main.py")
print(summary)  # the text the agent put in the completion block, or None

run() is blocking — it returns when the agent emits a completion block, hits max_steps, or is aborted. The return value is the completion block text, or None if the loop ended for any other reason.


4. Streaming

Enable streaming to receive the agent's reasoning text token-by-token, in real time, before any code executes. This dramatically improves perceived responsiveness.

from codepilot import Runtime, on_stream

runtime = Runtime("agent.yaml", stream=True)


@on_stream(runtime)
def handle_stream(text: str, **_):
    """Fires with each chunk of streamed text."""
    print(text, end="", flush=True)


runtime.run("Refactor the database module to use async SQLAlchemy")

What gets streamed

The runtime streams in two windows per step:

  1. Pre-fence text — everything before the ```codepilot block. This is the agent's reasoning paragraph and any display ```python blocks used in explanations. Streams in real time as the LLM generates it.

  2. Completion block — the ```completion block content, when the task is done. Streams in real time directly to the user. The loop terminates after this.

Everything between the two windows (the codepilot block, payload blocks) is buffered silently while tools execute.

For chat/question responses (no codepilot block at all), the entire response streams token-by-token and the loop exits cleanly.

Non-streaming mode

Without stream=True, the full response is emitted as a single STREAM event when inference completes. The on_stream hook still fires — you see the complete text at once rather than token-by-token.

runtime = Runtime("agent.yaml")   # stream=False by default

@on_stream(runtime)
def show_reasoning(text: str, **_):
    print(f"\n{text}\n")

5. Multi-turn Execution

Call run() multiple times on the same Runtime instance. Each call appends to the shared conversation history. The LLM sees every prior task, every file it wrote, and every command it ran.

from codepilot import Runtime

runtime = Runtime("agent.yaml")

# Turn 1
runtime.run("Create a FastAPI app with a /items GET endpoint")

# Turn 2 — agent has full context of what it built in turn 1
runtime.run("Now add a POST /items endpoint with Pydantic validation")

# Turn 3 — agent knows the full codebase it has built
runtime.run("Add pytest tests for both endpoints")

6. Session Persistence

Session backends are chosen at construction time.

Backend Storage Survives restart Best for
"memory" (default) RAM only Scripts, one-off tasks
"file" ~/.codepilot/sessions/ CLI tools, local dev
"db" Any SQL database Web apps, containers, multi-user

In-memory (default)

runtime = Runtime("agent.yaml")                          # memory, id = agent name
runtime = Runtime("agent.yaml", session="memory")       # explicit, same thing
runtime = Runtime("agent.yaml", session="memory", session_id="my-session")

File-backed

History is serialised to ~/.codepilot/sessions/<session_id>.json after every run(). Directory is created automatically.

runtime = Runtime("agent.yaml", session="file")                     # id = agent name
runtime = Runtime("agent.yaml", session="file", session_id="ecommerce-api")

# Custom session directory
from pathlib import Path
runtime = Runtime(
    "agent.yaml",
    session="file",
    session_id="ecommerce-api",
    session_dir=Path("/data/codepilot-sessions"),
)

Session file format:

{
  "session_id": "ecommerce-api",
  "agent_name": "BackendEngineer",
  "created_at": 1712345678.0,
  "updated_at": 1712349999.0,
  "messages": [ ... ]
}

Database-backed

Persists history to any SQLAlchemy-compatible database. The codepilot_sessions table is created automatically — no migration scripts needed. This is the correct backend for web apps deployed in containers.

# Install the db extras
pip install codepilot-ai[db]          # SQLite or PostgreSQL
pip install psycopg2-binary           # PostgreSQL driver only
# SQLite — simple, zero-config, great for local persistence
runtime = Runtime(
    "agent.yaml",
    session="db",
    session_id="user-42",
    db_url="sqlite:///./codepilot.db",
)

# PostgreSQL — for containers, Cloud Run, multi-user apps
import os
runtime = Runtime(
    "agent.yaml",
    session="db",
    session_id=f"user-{user_id}",
    db_url=os.environ["DATABASE_URL"],
)

Persistence behaviour:

Moment What happens
Runtime(...) construction One SELECT — loads prior messages for the session_id, or [] for new sessions
Each run() call All agentic steps run fully in-memory — zero DB I/O during inference
run() completes One atomic UPSERT — full messages list written to DB
New Runtime(...) same session_id One SELECT — session fully restored
runtime.reset() DELETE row — clean slate

Listing all sessions:

from codepilot import DatabaseSession

ds = DatabaseSession(session_id="_", db_url="sqlite:///./codepilot.db")
for s in ds.list_sessions():
    print(f"{s['session_id']:30} {s['messages']:4} messages")

7. Context Memory Management

For long-running sessions, CodePilot automatically manages the LLM's context window using a three-zone progressive compression system. It requires zero configuration — the defaults are tuned for typical coding sessions.

How it works

At the start of every run() call, before the new task is appended:

  1. Task-level summarization — if the most recently completed task exceeds min_task_tokens, a summarizer LLM call compresses it to a single [TASK SUMMARY] message (~150 tokens). The new task prompt is passed to the summarizer so retention is biased toward what matters next.
  2. Global summarization — if total context exceeds global_summary_threshold × max_context_tokens, the oldest half of messages is collapsed into a single [GLOBAL SUMMARY] message.
  3. Active task — always kept 100% raw, never touched.

Small tasks (quick edits, short commands) stay raw permanently — the threshold prevents compressing tasks that don't need it.

Configuration

Add a memory: block to your agent.yaml. All fields are optional — the defaults work well:

agent:
  name: "BackendEngineer"
  model:
    provider: "anthropic"
    name: "claude-opus-4-5"
    api_key_env: "ANTHROPIC_API_KEY"

  memory:
    # Token estimator (chars / chars_per_token = tokens)
    # Tune once by spot-checking against your tokenizer. ±15% error is fine.
    chars_per_token: 3.8

    # Your model's context window limit
    max_context_tokens: 120000

    # Task-level summarization: only summarize tasks larger than this.
    # Default 4000 means small edits (typically ~1400 tokens) are kept raw.
    # Raise to skip all task-level summarization; lower to compress aggressively.
    min_task_tokens: 4000

    # Target length for each task summary (in tokens)
    task_summary_max_tokens: 200

    # Global summarization triggers when total context exceeds this fraction
    # of max_context_tokens. 0.7 = trigger at 84k tokens for a 120k model.
    global_summary_threshold: 0.7

    # Target length for the single global summary message
    global_summary_max_tokens: 500

What the LLM sees in a long session

[GLOBAL SUMMARY]         ← oldest tasks, collapsed into one ~500-token overview
[TASK SUMMARY]           ← task N, ~150 tokens
[TASK SUMMARY]           ← task N+1, ~150 tokens (or raw if under threshold)
[USER INPUT] + steps     ← active task, 100% raw

The system prompt always includes a Global State Memory block — a live structured JSON snapshot of what the agent has done (files created/modified, commands run, open issues) updated after every summarization:

{
  "objective": "Building a FastAPI e-commerce backend",
  "files_created": ["main.py", "models/user.py", "routes/users.py"],
  "files_modified": ["routes/users.py (L31-52, POST handler)"],
  "commands_run": ["pytest tests/ — 12 passed"],
  "open_issues": ["Email verification not implemented"]
}

8. Resuming a Session

Pass the same session_id to a new file-backed Runtime and the prior conversation loads automatically.

# Process 1
runtime = Runtime("agent.yaml", session="file", session_id="ecommerce-api")
runtime.run("Create the products and orders FastAPI endpoints")
# Process exits — session saved

# -------- later, new process --------

# Process 2 — picks up exactly where process 1 left off
runtime = Runtime("agent.yaml", session="file", session_id="ecommerce-api")
runtime.run("Add database migrations using Alembic")

Listing saved sessions

from codepilot import FileSession

fs = FileSession(session_id="_", agent_name="_")
for s in fs.list_sessions():
    print(f"{s['session_id']:30} {s['messages']:4} messages  updated {s['updated_at']}")

Inspecting a session without loading messages

from codepilot import FileSession

fs = FileSession(session_id="ecommerce-api", agent_name="BackendEngineer")
meta = fs.metadata()
if meta:
    print(f"Last updated: {meta['updated_at']}")
    print(f"File path: {fs.path}")
else:
    print("No saved session — will start fresh")

8. Resetting a Session

Wipes all history and deletes the session file (if file-backed). The next run() starts completely fresh.

runtime = Runtime("agent.yaml", session="file", session_id="ecommerce-api")

# ... some runs ...

runtime.reset()
runtime.run("Start over — build a GraphQL API instead")

9. Hooks

Hooks are the observability system. Every significant runtime event fires a hook. Register handlers to receive them in your application.

All built-in decorators replace the default stdout handler. The defaults work out of the box with zero configuration.

from codepilot import (
    Runtime,
    on_stream,
    on_tool_call,
    on_tool_result,
    on_ask_user,
    on_finish,
    on_user_message_queued,
    on_user_message_injected,
    EventType,
)

runtime = Runtime("agent.yaml", stream=True)


@on_stream(runtime)
def handle_stream(text: str, **_):
    """Fires for each text chunk — both pre-fence reasoning and completion block content."""
    print(text, end="", flush=True)


@on_tool_call(runtime)
def handle_tool_call(tool: str, args: dict, label: str = "", **_):
    """Fires before every tool executes.
    `label` is a human-readable description (e.g. "Running `pytest tests/`").
    Falls back to args dump if label is not set.
    """
    display = label if label else str(args)
    print(f"\n⚙️  [{tool}] {display}")


@on_tool_result(runtime)
def handle_tool_result(tool: str, result: str, **_):
    """Fires after every tool returns."""
    print(f"   ↳ {result[:200]}")


@on_ask_user(runtime)
def handle_ask(question: str, **_):
    """Fires when the agent calls ask_user()."""
    print(f"\n{question}")


@on_finish(runtime)
def handle_finish(summary: str, **_):
    """Fires when the task completes (completion block detected)."""
    print(f"\n{summary}\n")


@on_user_message_queued(runtime)
def handle_queued(message: str, **_):
    """Fires immediately when send_message() is called (not yet in context)."""
    print(f"[Queued] {message}")


@on_user_message_injected(runtime)
def handle_injected(message: str, **_):
    """Fires when a queued message enters the LLM's context window."""
    print(f"[Injected] {message}")


runtime.run("Refactor the database module to use async SQLAlchemy")

Manual hook registration

from codepilot import EventType

runtime.hooks.register(EventType.STREAM,  lambda text, **_: print(text, end="", flush=True))
runtime.hooks.register(EventType.FINISH,  lambda summary, **_: save_to_db(summary))

Full event reference

Event Keyword args When it fires
START task run() is called
STEP step, max_steps Each agentic step begins
STREAM text Chunk of streamed text (pre-fence reasoning or completion block content)
TOOL_CALL tool, args, label Before any tool executes
TOOL_RESULT tool, result After any tool returns
ASK_USER question Agent calls ask_user()
PERMISSION_REQUEST tool, description Tool with require_permission: true fires
SECURITY_ERROR error AST validation rejects the control block
RUNTIME_ERROR error exec() throws an exception
FINISH summary Task complete — completion block detected
MAX_STEPS Loop exits because max_steps was reached
USER_MESSAGE_QUEUED message send_message() called
USER_MESSAGE_INJECTED message Queued message enters LLM context
SESSION_RESET reset() called

10. Permission Gating

The execute tool (and optionally write_file) supports require_permission: true in the AgentFile. When enabled, a PERMISSION_REQUEST hook fires before the tool runs. Return True to approve, False to deny. Falls back to a CLI y/N prompt if no handler is registered.

from codepilot import Runtime, on_permission_request

runtime = Runtime("agent.yaml")


@on_permission_request(runtime)
def gate(tool: str, description: str, **_) -> bool:
    """
    tool        — "write_file" | "execute"
    description — human-readable description of the specific operation
    Return True to approve, False to deny.
    """
    print(f"\n⚠️  [{tool}] {description}")
    return input("Approve? [y/N]: ").strip().lower() in ("y", "yes")


runtime.run("Deploy the application")

Programmatic approval (e.g. in a web app):

@on_permission_request(runtime)
def auto_gate(tool: str, description: str, **_) -> bool:
    if tool == "read_file":
        return True
    if tool == "execute" and "pytest" in description:
        return True
    return False   # deny everything else

11. Mid-task Message Injection

runtime.run() is blocking and runs on the calling thread. From any other thread, call runtime.send_message() to inject a message into the running agent.

  1. Queued immediately (non-blocking, thread-safe)
  2. Tagged [USER MESSAGE] — distinct from [USER INPUT] (the original task)
  3. Injected into the LLM context at the next step boundary — never mid-step
import threading
import time
from codepilot import Runtime, on_stream, on_user_message_injected

runtime = Runtime("agent.yaml", stream=True)


@on_stream(runtime)
def show(text: str, **_):
    print(text, end="", flush=True)


@on_user_message_injected(runtime)
def confirmed(message: str, **_):
    print(f"\n[Your message is now in context]: {message}")


def run_agent():
    runtime.run("Create a utility module with five string helper functions")


agent_thread = threading.Thread(target=run_agent)
agent_thread.start()

time.sleep(5)
runtime.send_message("Also add type hints to every function")

agent_thread.join()

12. Multi-operation Steps

The agent can perform multiple file operations in a single step, reducing round-trips and improving efficiency.

Multiple file writes

Up to 5 write_file() calls with mode='w' or mode='a' per step. Each call consumes the next payload block in order.

LLM output (writes two files in one step):

Alright, both files are independent so I'll write them together.

```codepilot
# Two new files — order of write_file() matches order of payload blocks below.
write_file("config.py")
write_file("utils.py")
```

```python
import json, os

def load(path: str) -> dict:
    if not os.path.exists(path):
        return {}
    with open(path) as f:
        return json.load(f)
```

```python
def slugify(text: str) -> str:
    return text.lower().replace(" ", "-")
```

Multi-edit (multiple non-contiguous edits in one file)

Use mode='multi_edit' with edits=[(start1, end1), (start2, end2)] to fix multiple ranges in one file without line-number drift. The runtime applies edits bottom-to-top automatically. One Payload Block per tuple, in order.

```codepilot
# Fix L42-48 (error handling) and L55 (regex) in one step — no drift
write_file("routes/profile.py", mode="multi_edit", edits=[(42, 48), (55, 55)])
```

```python
# ... replacement for L42-48 ...
```

```python
# ... replacement for L55 ...
```

Multiple file reads

Any number of read_file() calls per step — no limit.

# LLM control block:
read_file("config.py")
read_file("utils.py")
read_file("tests/test_config.py")

13. Shell Tools

The agent has a persistent, non-blocking shell session system powered by pexpect. Commands never hang the agent — output is captured up to a timeout and returned immediately.

Linux/macOS only. pexpect requires POSIX. Deploy in a Linux container.

A default shell session ("main") starts automatically when the Runtime is created. Its PID and status are shown in the agent's system prompt every step.

execute — run a command

Runs a command, waits up to timeout seconds, returns whatever output is available.

# LLM control block:

# status: completed → command finished within timeout (includes return_code)
execute("main", "pytest tests/ -v", 30)

# status: running → timeout hit, process still alive
execute("main", "pip install -r requirements.txt", 10)

# Spin up a server on its own shell, in one step
execute("server", "uvicorn app.main:app --host 0.0.0.0 --port 8000", 4, new_shell=True)

read_output — wait for more output

Called after execute returned status: running. Waits up to timeout seconds for new output.

  • New output available: returns only the new delta (non-overlapping with previous output).
  • No new output (command already done): returns the complete accumulated output and collapses previous outputs in the context to save tokens.
# LLM control block:
read_output("main", 30)   # wait up to 30 more seconds

send_input — interact with prompts

Sends text to an interactive command waiting for user input.

# LLM control block:
send_input("main", "yes\n", 5)    # confirm a CLI prompt
send_input("main", "admin\n", 5)  # enter a username

send_signal — interrupt or stop

# Interrupt foreground process (Ctrl+C) — shell survives
send_signal("server", "SIGINT")

# Terminate or kill the shell process entirely
send_signal("server", "SIGTERM")
send_signal("server", "SIGKILL")

kill_shell — destroy a session

kill_shell("server")   # terminates the process, removes the session

Full example: server + test

# Step 1 — LLM control block:
# Start server on its own shell, verify startup logs within 4s
execute("server", "uvicorn app.main:app --port 8000", 4, new_shell=True)

# Step 2 — LLM control block (after seeing server startup logs):
# Run tests against the live server from main shell
execute("main", "pytest tests/test_api.py -v", 30)

# Step 3 — LLM control block (after tests pass):
# Shut server down cleanly — then use a completion block to finish
send_signal("server", "SIGINT")

Context deduplication

When read_output() returns in full-mode (the command is already done, no new data), it automatically removes the earlier outputs for that command from the conversation history and returns one complete, consolidated result. This keeps the agent context lean on long-running tasks.


14. Completion Block

The ```completion block is how the agent signals a task is done. Its content is natural text that streams directly to the user in real time — token by token just like the pre-fence reasoning. When the runtime detects it, the agentic loop terminates after the current step.

Why it exists

  • No wasted stepdone() required a dedicated agentic step just to call it. The completion block can be combined with the action step, saving a full LLM inference call on simple tasks.
  • Real-time streaming — the completion text reaches the user as the LLM generates it, not after.
  • Natural — the agent just writes its closing message as plain text inside the fence, rather than constructing a Python string argument.

Separate final step (multi-step tasks)

After tests pass and all work is verified:

All green — both fixes are solid.

```completion
Fixed the 500 on profile email update: two bugs squashed.
(1) `routes/profile.py:L42` — bare DB write had no error handling; wrapped in try/except,
now returns a proper 400 on failure.
(2) `utils/validators.py:L18` — email regex was rejecting `+` aliases; pattern updated.
All tests pass. You're good to go.
```

Same-step completion (simple tasks)

For simple tasks, combine everything in one agentic step:

Updating the timeout value.

```codepilot
write_file("config.py", start_line=12, end_line=12, mode="edit")
```

```python
TIMEOUT = 30
```

```completion
Done — updated TIMEOUT from 10 to 30 seconds in config.py:L12.
```

Receiving it in your app

The completion block fires the FINISH hook with its text as summary:

@on_finish(runtime)
def handle_finish(summary: str, **_):
    print(f"\n{summary}\n")
    save_to_database(summary)   # or send a notification, etc.

summary = runtime.run("Fix the login bug")
# summary == the completion block text, or None if loop ended another way

15. Workspace Change Detection

The runtime automatically detects when you modify files in the workspace between agent steps. If you edit a file while the agent is working, it will be notified at the start of the next step with exact line numbers of what changed.

What the agent sees in its context:

[ENVIRONMENT CHANGE] 2026-02-21 16:30:12

📝 Modified: main.py
  Changed lines: 1-4, 47
📄 Created: .env (3 lines)
🗑️ Deleted: old_config.py

The agent is then instructed to re-read affected files before editing — because its cached line numbers may be wrong.

How it works:

  • Tracking is opt-in by file — only files the agent has touched (read or written) are watched
  • Detection is snapshot-based — no background daemon, no file watchers, zero overhead between steps
  • Snapshots are taken at the end of each step and compared at the start of the next
  • Diff limits: 30 changed lines reported per file, 100 total across all files

No configuration is required — this is always on.


16. Chat Mode

The agent can respond to questions and explanations without executing any code. If the LLM produces a response with no ```codepilot block, the runtime treats it as a conversational reply: the response is fully streamed to the user and the loop exits cleanly.

runtime = Runtime("agent.yaml", stream=True)

@on_stream(runtime)
def show(text: str, **_):
    print(text, end="", flush=True)


@on_finish(runtime)
def done(summary: str, **_):
    print(f"\n{summary}")


# Agent answers with natural markdown — no code executed, streams fully
runtime.run("How does the config loader handle missing files?")

# Agent takes action — executes code, ends with completion block
runtime.run("Add a fallback default value to the config loader")

The agent freely uses ```python blocks to display code examples in its explanations — they are never executed. Only ```codepilot blocks execute.

Step awareness

The agent's system prompt is refreshed every step with the current timestamp, OS, working directory, and a live step counter with progressive urgency:

# Steps 1-9 of 30 — neutral
Agentic step 3 / 30

# Steps 10-22 of 30 — mild signal
Agentic step 12 / 30 — 40% agentic steps consumed!

# Steps 23-26 of 30 — approaching
Agentic step 24 / 30 — 80% agentic steps consumed. Approaching step limit!

# Steps 27-30 of 30 — urgent
Agentic step 28 / 30 — 93% agentic steps consumed! Hard Limit Near!

This allows the agent to reason about time, deadlines, and to self-regulate efficiency as it approaches the configured max_steps limit.


17. Custom Tools

Register any callable as a tool. Its docstring is automatically pulled into the system prompt so the agent knows when and how to use it.

Important: exec() discards return values. If your tool produces output the agent should see, explicitly call runtime._append_execution(result).

from codepilot import Runtime

runtime = Runtime("agent.yaml")


def web_search(query: str):
    """
    Search the web for current information and return a summary.
    Use for library documentation, recent API changes, error lookups,
    or anything the codebase snapshot can't answer.
    """
    result = my_search_api(query)
    runtime._append_execution(f"[web_search] {result}")


def send_slack(channel: str, message: str):
    """
    Send a message to a Slack channel.
    Use after completing a task to notify the team.
    channel should be the channel name without #, e.g. 'deployments'.
    """
    slack_client.chat_postMessage(channel=f"#{channel}", text=message)
    runtime._append_execution(f"[send_slack] Message sent to #{channel}.")


runtime.register_tool("web_search", web_search)
runtime.register_tool("send_slack", send_slack)

runtime.run("Research the latest SQLAlchemy 2.0 async API and implement a connection pool")

Overriding a built-in tool

def safe_execute(session_id: str, command: str, timeout: int = 10, new_shell: bool = False):
    """
    Run a shell command. Restricted to read-only operations in this environment.
    Never import subprocess or os directly — always use this tool.
    """
    blocked = ["rm", "del", "format", ">", "sudo", "pip install"]
    if any(cmd in command for cmd in blocked):
        runtime._append_execution(f"[execute] Blocked: '{command}' is not permitted.")
        return
    runtime._shell_manager.execute(session_id, command, timeout, new_shell)


runtime.register_tool("execute", safe_execute, replace=True)

18. Aborting the Agent

import threading

runtime = Runtime("agent.yaml")

agent_thread = threading.Thread(
    target=runtime.run,
    args=("Build a complete e-commerce backend",)
)
agent_thread.start()

# From anywhere — stops after the current step completes (never mid-step)
runtime.abort()
agent_thread.join()

19. Building a CLI Tool

Simple conversational CLI

import sys
from codepilot import Runtime, on_stream, on_finish, on_ask_user

runtime = Runtime("agent.yaml", session="memory", stream=True)


@on_stream(runtime)
def show_stream(text: str, **_):
    print(text, end="", flush=True)


@on_finish(runtime)
def show_done(summary: str, **_):
    print(f"\n{summary}\n")


@on_ask_user(runtime)
def show_question(question: str, **_):
    print(f"\n{question}")


print("CodePilot CLI — type 'reset' to clear history, 'quit' to exit.\n")

while True:
    try:
        task = input("You: ").strip()
    except (KeyboardInterrupt, EOFError):
        print("\nGoodbye.")
        sys.exit(0)

    if not task:
        continue

    if task.lower() == "quit":
        sys.exit(0)

    if task.lower() == "reset":
        runtime.reset()
        print("History cleared. Starting fresh.\n")
        continue

    runtime.run(task)

File-backed CLI with named sessions

import sys
import argparse
from codepilot import Runtime, FileSession, on_stream, on_finish

parser = argparse.ArgumentParser()
parser.add_argument("--session", default=None, help="Session ID to resume")
parser.add_argument("--list", action="store_true", help="List saved sessions")
args = parser.parse_args()

if args.list:
    fs = FileSession(session_id="_", agent_name="_")
    sessions = fs.list_sessions()
    if not sessions:
        print("No saved sessions.")
    for s in sessions:
        print(f"  {s['session_id']:30} {s['messages']:4} messages")
    sys.exit(0)

session_id = args.session or "default"
runtime = Runtime("agent.yaml", session="file", session_id=session_id, stream=True)

fs = FileSession(session_id=session_id, agent_name="")
if fs.exists():
    print(f"Resuming session '{session_id}' ({len(runtime.messages)} messages)\n")
else:
    print(f"Starting new session '{session_id}'\n")


@on_stream(runtime)
def streaming(text: str, **_):
    print(text, end="", flush=True)


@on_finish(runtime)
def done(summary: str, **_):
    print(f"\n{summary}\n")


while True:
    try:
        task = input("You: ").strip()
    except (KeyboardInterrupt, EOFError):
        print("\nSession saved. Goodbye.")
        sys.exit(0)

    if not task:
        continue
    if task.lower() in ("reset", "clear"):
        runtime.reset()
        print("Session cleared.\n")
        continue
    if task.lower() in ("quit", "exit"):
        sys.exit(0)

    runtime.run(task)
python cli.py                              # new default session
python cli.py --session ecommerce-api      # resume named session
python cli.py --list                       # show all saved sessions

20. Building a Web Server Integration

FastAPI example with WebSocket streaming (token-by-token to the browser) and mid-task injection:

import asyncio
import threading
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
from codepilot import Runtime, EventType

app = FastAPI()

runtime = Runtime("agent.yaml", session="file", session_id="web-session", stream=True)

# Bridge between sync hooks and async WebSocket
_event_queue: asyncio.Queue = asyncio.Queue()


def _push(event: dict):
    """Thread-safe push from sync hook into async queue."""
    asyncio.get_event_loop().call_soon_threadsafe(_event_queue.put_nowait, event)


# Stream reasoning text and completion block content token by token
runtime.hooks.register(EventType.STREAM,
    lambda text, **_: _push({"type": "stream", "text": text}))

# Tool activity — label gives a clean human-readable status string
runtime.hooks.register(EventType.TOOL_CALL,
    lambda tool, args, label="", **_: _push({
        "type": "tool_call", "tool": tool,
        "label": label or tool,           # e.g. "Running `pytest tests/`"
    }))

runtime.hooks.register(EventType.TOOL_RESULT,
    lambda tool, result, **_: _push({"type": "tool_result", "tool": tool, "result": result[:300]}))

runtime.hooks.register(EventType.FINISH,
    lambda summary, **_: _push({"type": "finish", "summary": summary}))

runtime.hooks.register(EventType.RUNTIME_ERROR,
    lambda error, **_: _push({"type": "error", "error": error}))


@app.post("/run")
def start_task(task: str):
    """Start a new task. Non-blocking — agent runs in background thread."""
    threading.Thread(target=runtime.run, args=(task,), daemon=True).start()
    return {"status": "started"}


@app.post("/message")
def inject_message(message: str):
    """Inject a mid-task message. Returns immediately."""
    runtime.send_message(message)
    return {"status": "queued"}


@app.post("/reset")
def reset_session():
    """Wipe conversation history and start fresh."""
    runtime.reset()
    return {"status": "reset"}


@app.websocket("/events")
async def stream_events(websocket: WebSocket):
    """Stream all hook events to the frontend as JSON."""
    await websocket.accept()
    try:
        while True:
            event = await _event_queue.get()
            await websocket.send_json(event)
    except WebSocketDisconnect:
        pass

21. Full API Surface

Runtime

Runtime(
    agent_file: str,              # path to agent.yaml
    session: str = "memory",      # "memory" | "file"
    session_id: str = None,       # defaults to agent name, slugified
    session_dir: Path = None,     # override ~/.codepilot/sessions/
    stream: bool = False,         # True = token-by-token streaming
)

runtime.run(task: str) -> Optional[str]
    # Blocking. Appends to history. Returns completion block text or None.

runtime.send_message(message: str)
    # Thread-safe. Non-blocking. Tagged [USER MESSAGE] in context.

runtime.reset()
    # Wipes messages + session file. Next run() is a blank slate.

runtime.abort()
    # Sets abort flag. Loop stops after current step.

runtime.register_tool(name: str, func: callable, replace: bool = False)
    # Add custom tool. Docstring injected into system prompt automatically.

runtime.messages           # List[Dict] — full conversation history
runtime.session            # BaseSession — current session backend instance
runtime.hooks              # HookSystem — register/emit events manually
runtime.registry           # ToolRegistry — inspect registered tools

Hook decorators

from codepilot import (
    on_stream,                  # STREAM — pre-fence reasoning text or completion block content
    on_tool_call,               # TOOL_CALL — before any tool executes
    on_tool_result,             # TOOL_RESULT — after any tool returns
    on_ask_user,                # ASK_USER — agent called ask_user()
    on_finish,                  # FINISH — task complete (completion block detected)
    on_permission_request,      # PERMISSION_REQUEST — awaiting approval
    on_user_message_queued,     # USER_MESSAGE_QUEUED — send_message() called
    on_user_message_injected,   # USER_MESSAGE_INJECTED — message in context
)

Built-in tools

write_file(path, start_line=None, end_line=None, after_line=None, mode='w', edits=None)

mode Behaviour Limit
'w' Create or overwrite the whole file 5 per step
'a' Append to end of file 5 per step (shared with 'w')
'edit' Replace lines start_line to end_line 1 per file per step
'insert' Insert after after_line (0 = top of file) 1 per file per step
'multi_edit' edits=[(s1,e1), (s2,e2)]. Runtime applies bottom-to-top. 1 per file per step

Content always comes from the next payload block — never pass it as a string argument.

read_file(path, start_line=1, end_line=None)

Returns file content with 1-indexed line numbers. Multiple calls per step are allowed.

execute(session_id, command, timeout=10, new_shell=False)

Runs a command on a persistent shell session. Returns captured output up to timeout seconds.

Parameter Description
session_id Shell session to use. "main" always exists.
command Shell command string.
timeout Seconds to wait. Output captured on timeout.
new_shell True = create and use a new shell in one step.

Result includes status: completed (done, has return_code) or status: running (timed out, process alive).

read_output(session_id, timeout=5)

Read new output from the latest command. Returns delta (new content only) or full accumulated output if the command is already done. Full-mode collapses previous outputs from context automatically.

send_input(session_id, text, timeout=5)

Send text to an interactive command waiting for input. Returns new output after sending.

send_signal(session_id, signal='SIGINT')

Send SIGINT (Ctrl+C, shell survives), SIGTERM, or SIGKILL to the shell session.

kill_shell(session_id)

Terminate and remove a shell session entirely.

ask_user(question)

Pauses execution and prompts the user for input. Fires the ASK_USER hook.

find(pattern, scope='codebase', target=None, include=None, max_results=50)

Text / regex search across a file, multiple files, or the entire workspace. Results are returned as file:line:matched_line — one match per line.

Uses ripgrep (rg) when available — fast and honours .gitignore automatically (ignores node_modules, build artifacts, lock files). Falls back to a pure-Python implementation when rg is not installed.

Parameter Description
pattern Regex pattern. Escape special chars: r'validate_email\('
scope 'file' / 'files' / 'codebase'
target File path (str) or list of paths — required for scope='file'/'files'
include Glob filter for scope='codebase'. e.g. '*.py', 'tests/**'
max_results Cap on returned matches (default 50)
# LLM control block examples:
find(pattern=r'validate_email\(', scope='file', target='routes/profile.py')
find(pattern='TODO:', scope='files', target=['routes/profile.py', 'utils/validators.py'])
find(pattern=r'class \w+Handler', scope='codebase', include='*.py')
find(pattern='import torch', scope='codebase', include='tests/**')

Install ripgrep for best performance (optional — Python fallback is always available):

apt-get install ripgrep      # Debian/Ubuntu
brew install ripgrep          # macOS

semantic_search(query, mode='search', depth=2, top_k=5)

Semantically searches the codebase using the voyage-code-3 embedding model via grepai. Finds code by concept — not text match. Use when you don't know which file or function to look at. Use find() when you know the exact symbol or string.

Requires VOYAGE_API_KEY set in environment and api_key_env: "VOYAGE_API_KEY" in the AgentFile config.

First call is slow (~30-120s): grepai auto-installs if missing, indexes the entire work_dir, then searches. Subsequent calls are fast.

mode What it does
'search' Find files/functions matching a natural language concept
'trace_callers' Find every place that calls a given function/method
'trace_callees' Find everything a function calls internally
'trace_graph' Full dependency tree up to depth levels — use before modifying code with wide blast radius

Environment setup:

export VOYAGE_API_KEY="pa-..."

How the API key flows: grepai internally reads OPENAI_API_KEY. The runtime automatically aliases your VOYAGE_API_KEYOPENAI_API_KEY at subprocess launch — you never need to rename your env var.

grepai index location: ~/.codepilot/grepai/<hash>/ — entirely outside your project. No .grepai/ directory is created in your codebase.

FileSession

FileSession(session_id, agent_name, session_dir=None)

.load() -> List[Dict]          # load messages from disk
.save(messages)                # persist messages to disk (atomic write)
.reset()                       # delete session file
.exists() -> bool              # True if file exists on disk
.metadata() -> Optional[Dict]  # session metadata without messages
.list_sessions() -> List[Dict] # all sessions in the session directory
.path -> Path                  # full path to the session file
.session_id -> str

InMemorySession

InMemorySession(session_id="default")

.load() -> List[Dict]
.save(messages)
.reset()
.session_id -> str

create_session

create_session(
    backend: str = "memory",     # "memory" | "file"
    session_id: str = "default",
    agent_name: str = "agent",
    session_dir: Path = None,
) -> BaseSession

CodePilot v0.5.0 — code-native agents, zero JSON, full context.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codepilot_ai-0.8.1.tar.gz (101.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codepilot_ai-0.8.1-py3-none-any.whl (80.2 kB view details)

Uploaded Python 3

File details

Details for the file codepilot_ai-0.8.1.tar.gz.

File metadata

  • Download URL: codepilot_ai-0.8.1.tar.gz
  • Upload date:
  • Size: 101.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codepilot_ai-0.8.1.tar.gz
Algorithm Hash digest
SHA256 3883824f9c887ed989386e0d4e8bcc31f520cb7f3eb6dfb7ea28609096243e6a
MD5 4c2c50c31ed28317e6846fd7d8603e4b
BLAKE2b-256 ec2de8e34cfdcd0af40c23c43ce4274706dc862ab33f34b789060d6a9e9bb9a2

See more details on using hashes here.

Provenance

The following attestation bundles were made for codepilot_ai-0.8.1.tar.gz:

Publisher: publish.yml on Jahanzeb-git/codepilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file codepilot_ai-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: codepilot_ai-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 80.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codepilot_ai-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 38a5550c2f5d54debb5719fd4af8081449cd068fed305a1a99ce1552372c8348
MD5 9295a58048115dd54d0eb0e16d5c07e4
BLAKE2b-256 270a22ef3554346a4caaa7c05d73240201b0226015998c68fafdd6c47e1c7895

See more details on using hashes here.

Provenance

The following attestation bundles were made for codepilot_ai-0.8.1-py3-none-any.whl:

Publisher: publish.yml on Jahanzeb-git/codepilot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page