Dynamic skill-based LangGraph agent
Project description
Birdie
___ ___ ___ ___
/\ \ ___ /\ \ /\ \ ___ /\ \
/::\ \ /\ \ /::\ \ /::\ \ /\ \ /::\ \
/:/\:\ \ \:\ \ /:/\:\ \ /:/\:\ \ \:\ \ /:/\:\ \
/::\~\:\__\ /::\__\ /::\~\:\ \ /:/ \:\__\ /::\__\ /::\~\:\ \
/:/\:\ \:|__| __/:/\/__/ /:/\:\ \:\__\ /:/__/ \:|__| __/:/\/__/ /:/\:\ \:\__\
\:\~\:\/:/ / /\/:/ / \/_|::\/:/ / \:\ \ /:/ / /\/:/ / \:\~\:\ \/__/
\:\ \::/ / \::/__/ |:|::/ / \:\ /:/ / \::/__/ \:\ \:\__\
\:\/:/ / \:\__\ |:|\/__/ \:\/:/ / \:\__\ \:\ \/__/
\::/__/ \/__/ |:| | \::/__/ \/__/ \:\__\
~~ \|__| ~~ \/__/
A LangGraph-based agent that discovers capabilities at runtime from SKILL.MD files. Skills, tools, and their execution entrypoints are all declared in plain Markdown - no code changes required to add new capabilities.
Birdie is a minimal yet fully functional implementation. The design goal is simplicity and transparency: the codebase is intended to be readable, hackable, and easy to extend. It ships with a small set of built-in skills (shell access, filesystem, SSH, todo, weather, DuckDuckGo search, MCP demo) that can run arbitrary local commands. All skills are disabled by default and must be explicitly enabled.
Security notice: Birdie has no guardrails against local tool misuse. Skills such as
ShellandFilesystemcan read, write, and execute anything the running user is permitted to do. Only enable skills you trust and run Birdie under an account with appropriate restrictions.
Installation
From PyPI (recommended)
pip install birdie-agent
From source
git clone https://github.com/gkvas/birdie.git
cd birdie
pip install -e .
Quick start
Configure your LLM provider via environment variables and run birdie:
# Mistral
export LLM_VENDOR=mistral
export LLM_MODEL=mistral-large-latest
export MISTRAL_API_KEY=your-key-here
birdie
# OpenAI
export LLM_VENDOR=openai
export LLM_MODEL=gpt-4o
export OPENAI_API_KEY=your-key-here
birdie
# Anthropic
export LLM_VENDOR=anthropic
export LLM_MODEL=claude-sonnet-4-6
export ANTHROPIC_API_KEY=your-key-here
birdie
Or pass a JSON config file:
birdie --config provider.json
{
"vendor": "mistral",
"model": "mistral-large-latest",
"api_key": "your-key-here",
"temperature": 0.3
}
On Windows (PowerShell):
$env:LLM_VENDOR = "openai"
$env:LLM_MODEL = "gpt-4o"
$env:OPENAI_API_KEY = "your-key-here"
birdie
Project layout
birdie/
├── agent/
│ ├── graph.py # LangGraph state machine (agent loop)
│ └── run.py # DynamicAgent - public API
├── cli.py # Interactive REPL
├── core/
│ ├── models.py # Skill, SkillTool, MCPServerConfig data models
│ ├── loader.py # SKILL.MD parser
│ ├── registry.py # In-memory skill/tool index
│ ├── policy.py # Per-user/session access control
│ ├── session.py # Session persistence (history, LTM, skill grants)
│ ├── adapter.py # SkillTool → LangChain StructuredTool
│ ├── entrypoints.py # bash / http / python / grpc resolvers
│ ├── mcp_client.py # MCP client manager (wraps langchain-mcp-adapters)
│ └── llm_provider.py # Vendor-agnostic LLM abstraction
└── skills/
├── weather/SKILL.MD
├── filesystem/SKILL.MD
├── shell/SKILL.MD
├── ssh/SKILL.MD
├── todo/SKILL.MD
├── duckduckgo/SKILL.MD
└── mcp_demo/
├── SKILL.MD # MCP skill declaration
└── server.py # Example stdio MCP server
Skill system
The skill system is built in three layers, each sitting on top of the one below:
┌──────────────────────────────────────────────────────┐
│ Knowledge skills (freetext SKILL.MD) │
│ Domain know-how injected on trigger. │
│ The LLM uses whatever tool skills are enabled. │
├──────────────────────────────────────────────────────┤
│ Tool skills (structured SKILL.MD) │
│ Named, schema-typed tools the LLM can call. │
│ Each tool is wired to exactly one entrypoint. │
├──────────────────────────────────────────────────────┤
│ Entrypoints (hardcoded in core/entrypoints.py) │
│ Fixed execution primitives: bash, http, python, ... │
│ Not declared in SKILL.MD - part of the framework. │
└──────────────────────────────────────────────────────┘
Entrypoints answer how to run something. Tool skills answer what can be run. Knowledge skills answer when and why - and delegate the actual execution back down to a tool skill.
The frontmatter / body boundary
Every SKILL.MD file is split into two distinct parts by the parser in core/loader.py. Understanding this boundary is fundamental to understanding how the skill system works.
Frontmatter is the YAML block between the opening and closing --- delimiters. It is parsed at startup into typed Python fields on the Skill Pydantic model. Frontmatter fields are structural metadata consumed by application code - the registry, the policy engine, the adapter, and the system prompt builder all read from these fields. Nothing in the frontmatter is ever sent verbatim to the LLM.
Body is everything in the file after the closing ---. It is stored as a single raw Markdown string in Skill.body. The body is not parsed, indexed, or processed in any way at load time. It is kept in memory and may be appended verbatim to the system prompt at turn time - but only under specific conditions (see below). The body is the text the LLM reads, not the text the application reads.
birdie/skills/ssh/SKILL.MD
─────────────────────────────────────────────────────────
--- ← frontmatter start
name: ssh ← parsed into Skill.name (used by code)
description: SSH connections ← parsed into Skill.description (used by code)
triggers: ← parsed into Skill.triggers (used by code)
- ssh
- remote server
--- ← frontmatter end / body start
# SSH Skill ←
← stored verbatim in Skill.body
## Capabilities ← appended to system prompt
- Establish SSH connections ← only when triggered
... ←
─────────────────────────────────────────────────────────
This boundary is what allows the skill system to be both efficient (only send what is needed to the LLM) and dynamic (add new skills without changing code - the parser handles any valid SKILL.MD).
Layer 1 - Entrypoints
Entrypoints are the fixed execution mechanisms built into core/entrypoints.py. They are not declared in SKILL.MD - they are the substrate that tool skills are built on. A tool skill picks one entrypoint scheme per tool; the framework resolves it at call time.
| Scheme | Format | Behaviour |
|---|---|---|
bash: |
bash:{command template} |
Shell command via subprocess. {arg} placeholders are substituted from tool-call arguments. Non-zero exit raises RuntimeError. |
http:get |
http:get https://host/path |
HTTP GET; kwargs become query parameters. |
http:post |
http:post https://host/path |
HTTP POST; kwargs become the JSON body. |
python: |
python:module.path.function |
Imports the module and calls the function with kwargs. |
grpc: |
grpc:package.Service/Method |
Stub - wire up a real gRPC channel. |
container: |
container:image_name |
Stub - wire up Docker/Podman. |
MCP tools do not use an entrypoint scheme. They are declared via
mcp_serverin the SKILL.MD frontmatter and loaded as native LangChain tools byMCPClientManager, bypassing the entrypoint resolver entirely. See the MCP integration section.
bash: is the workhorse for local skills. Arguments are injected via str.format(), so bash:cat {path} called with path="/etc/hosts" becomes cat /etc/hosts.
Layer 2 - Tool skills
Tool skills expose callable tools to the LLM. Each tool declares an entrypoint that the framework executes when the LLM calls it. Every skill lives in its own subdirectory as a SKILL.MD file.
---
name: Shell
version: 1.0.0
description: Execute arbitrary shell commands on the local machine.
tags: [shell, local, system]
enabled_by_default: false
---
## Tools
### run_bash
description: Run a shell command and return its output.
entrypoint: bash:{command}
schema:
type: object
properties:
command:
type: string
required: [command]
For a structured tool skill, the parser does the following:
- Frontmatter is parsed into
Skillmodel fields.name,description,tags,enabled_by_default,always_injectall become typed attributes. ## Toolssection is extracted from the body and parsed into a list ofSkillToolobjects, each holdingname,description,entrypoint, andschema. These are stored inSkill.tools.Skill.bodyis set toNone. A plain structured skill has no prose to inject - its tools alone are its contribution. No body is stored or ever sent to the LLM.
After parsing a structured skill:
Skill.name = "Shell"
Skill.description = "Execute arbitrary shell commands..."
Skill.tags = ["shell", "local", "system"]
Skill.tools = [SkillTool(name="run_bash", entrypoint="bash:{command}", schema={...})]
Skill.body = None ← no prose body; nothing to inject
The ## Tools section is deliberately excluded from Skill.body. If it were included, it would be sent as raw Markdown to the LLM on injection, which would be confusing and wasteful - the LLM already receives structured tool schemas through the function-calling API.
Tags declared on the skill (tags: [shell, local, system]) are propagated to every tool in the skill at registration time. This is what allows knowledge skills to find executor tools without knowing their names - they ask for tools by tag, not by name.
Frontmatter fields
| Field | Required | Description |
|---|---|---|
name |
yes | Unique skill identifier |
version |
no | Semver string (default 1.0.0) |
description |
yes | One-line summary - appears in the Tier 1 skill catalog sent to the LLM every turn |
tags |
no | Propagated to all tools at registration time; used for tag-based lookup |
enabled_by_default |
no | If true, all users get this skill without an explicit grant (default true) |
always_inject |
no | See below - the exception that allows a structured skill to also have a prose body |
Tool block fields
| Field | Required | Description |
|---|---|---|
description |
yes | Shown to the LLM as the tool's purpose; the LLM uses this to decide when to call the tool |
entrypoint |
yes | scheme:target - see Layer 1 |
schema |
yes | JSON Schema object describing the tool's arguments; used to build the Pydantic args_schema |
The always_inject exception
A structured skill can optionally carry a prose body alongside its tools by setting always_inject: true. In this case the parser stores the prose that appears before the ## Tools section in Skill.body, and that prose is sent to the LLM on every single turn, regardless of what the user said. This is for skills whose instructions are permanently relevant - for example a planning skill that tells the agent how to reason step-by-step, or a persona skill that defines communication style. Such instructions need to be present on every turn, not just when a trigger keyword fires.
always_inject structured skill - what gets stored:
Skill.tools = [SkillTool(...), ...] ← from ## Tools section
Skill.body = "Always reason step by ← prose before ## Tools
step before answering" injected every turn
The ## Tools section itself is never included in Skill.body in any case.
Layer 3 - Knowledge skills
Knowledge skills carry no tools of their own. Their SKILL.MD consists of frontmatter plus free-form Markdown prose - no ## Tools section.
---
name: ssh
description: Establish and manage SSH connections to remote machines.
triggers:
- ssh
- remote server
- remote connection
- secure shell
---
# SSH Skill
## Capabilities
- Establish SSH connections using password or key-based authentication
- Execute remote commands over SSH
...
For a knowledge skill, the parser does the following:
- Frontmatter is parsed into
Skillmodel fields as usual. ## Toolssection does not exist, soSkill.toolsis an empty list. NoSkillToolobjects are created. Nothing is registered in the tool index.Skill.bodyis set to the entire Markdown body - every character after the closing---. This is the full prose text, stored as-is, ready to be appended to the system prompt.
After parsing a knowledge skill:
Skill.name = "ssh"
Skill.triggers = ["ssh", "remote server", "remote connection", "secure shell"]
Skill.tools = [] ← no tools; nothing registered in the tool index
Skill.body = "# SSH Skill\n\n## Capabilities\n- Establish SSH..."
← full prose body stored verbatim
Why the body is not always sent
At startup, every enabled knowledge skill's body is sitting in memory. If all of them were appended to the system prompt on every turn, the prompt would grow with every skill added - potentially thousands of tokens of context that has nothing to do with what the user asked. This is wasteful and quickly becomes the dominant cost driver for API usage.
The trigger mechanism is the solution: Skill.triggers is a list of keyword phrases. At the start of every call_model invocation, the agent checks whether any trigger phrase appears as a substring of the most recent HumanMessage (case-insensitive). If none match, the body is not sent. If any match, the full Skill.body is appended to the system prompt for that turn only.
This keeps the baseline system prompt small - just Tier 1 (the compact skill catalog, roughly 50-100 tokens) - while making the full knowledge available on-demand. A session with 10 knowledge skills enabled pays the prose cost only for the specific skills relevant to each turn.
The body injection happens entirely in _build_system_prompt in graph.py, inside the call_model node. It reads from in-memory Skill objects - there is no disk I/O:
# graph.py - _build_system_prompt()
for skill in _triggered_freetext(state, allowed): # trigger matching
if skill.body:
system += f"\n\n--- {skill.name} skill context ---\n{skill.body}"
How the knowledge reaches the tools
The LLM is now responsible for the connection. When the ssh body is injected, the LLM reads it and understands what an SSH connection requires. It then looks at the available tools (the Shell skill's run_bash, for example) and constructs the appropriate command. No explicit wiring between the knowledge skill and the tool skill exists in the application code - the LLM makes the connection itself based on the context.
Note: Ensure the Shell skill (or another skill that provides execution tools) is enabled when using knowledge skills that require command execution.
Frontmatter fields
| Field | Required | Description |
|---|---|---|
name |
yes | Unique skill identifier |
description |
yes | One-line summary included in the Tier 1 skill catalog - always sent to the LLM |
triggers |
yes | Keyword phrases; any case-insensitive substring match in the user's message injects the body |
Skill loading
What is eager and what is lazy
Understanding which parts of the skill system are initialised at startup versus resolved at turn time is essential to understanding the design.
Skill directories
Birdie loads skills from two locations on every startup:
- Bundled skills - the
birdie/skills/directory shipped inside the package. Always present afterpip install birdie-agent. - User skills -
~/.birdie/skills/on your home directory, if it exists. Drop a new subdirectory with aSKILL.MDthere and it will be picked up automatically on the next start - no reinstall, no flags.
To use a completely different directory instead of the bundled one, pass --skills-dir PATH. The user skills directory ~/.birdie/skills/ is always also loaded on top of whichever primary directory is used.
Default skill state
All built-in skills are disabled by default. This is a deliberate safety choice - skills like Shell and Filesystem can execute arbitrary local commands, and they should only be active when you explicitly choose to use them.
Enable skills at the start of a session:
/skill enable Shell
/skill enable DuckDuckGo
Enabled/disabled state is persisted per session. Use /skill list to see all available skills and their current status.
Built-in skills
| Skill | Description |
|---|---|
Shell |
Run arbitrary shell commands |
Filesystem |
Read and write local files |
ssh |
Connect to remote hosts and run commands |
ToDo |
Step-by-step planning and progress tracking |
Weather |
Weather lookup via external API |
DuckDuckGo |
Web search with no API key required |
mcp_demo |
Demo MCP server (echo and reverse_string) |
What is eager and what is lazy
Understanding which parts of the skill system are initialised at startup versus resolved at turn time is essential to understanding the design.
Eager - happens once at startup:
discover_skills_from_directory scans every subdirectory of the skills directory, finds SKILL.MD files, and calls parse_skill_markdown on each. This produces a fully populated Skill Pydantic object whose fields are derived entirely from the file:
Skill.name,Skill.description,Skill.tags,Skill.triggers,Skill.enabled_by_default,Skill.always_inject- from YAML frontmatterSkill.tools- list ofSkillToolobjects parsed from the## Toolssection (empty for knowledge skills)Skill.body- raw Markdown string of the prose body (empty for plain structured skills, full body for knowledge skills, pre-## Toolsprose foralways_injectskills)Skill.mcp_server- parsed from themcp_serverfrontmatter key if present
After parsing, each Skill is registered in the SkillRegistry (which builds name, tag, and tool-ownership indexes) and its mcp_server config (if any) is registered with MCPClientManager. Finally, UserSkillPolicy.set_default_skills seeds which skills are on by default.
After startup, no SKILL.MD file is ever read again. All turn-time decisions are made from in-memory Skill objects.
Lazy - happens on every agent turn:
Three things are deliberately deferred to turn time:
-
Tool schema and execution wiring.
SkillToolobjects store only the entrypoint string and JSON Schema. TheStructuredToolwrapper that LangChain'sToolNodecan actually execute is created fresh inexecute_toolson every turn. This is necessary because the set of allowed skills is resolved per-turn from the policy engine - a skill can be enabled or disabled between turns, so the tool list cannot be fixed at startup. -
Trigger matching and body injection. The prose in
Skill.bodyis never sent to the LLM automatically. On eachcall_modelinvocation,_build_system_promptcompares the most recentHumanMessageagainst thetriggerslist of every allowed knowledge skill. Only matching skills have their body appended to the system prompt for that turn. This is the primary token cost-control mechanism: skills whose knowledge is not relevant to the current message contribute nothing to the prompt. -
MCP tool discovery.
MCPClientManager.get_tools()is called on every turn. The first call connects to the MCP server and caches the tool list; subsequent calls return the cache. The deferral means the MCP subprocess is not spawned until it is actually needed.
The loading sequence end to end:
startup
└─ for each skills dir (bundled, then ~/.birdie/skills/ if present)
└─ discover_skills_from_directory(dir)
└─ for each subdir/SKILL.MD
└─ parse_skill_markdown()
├─ YAML frontmatter → Skill model fields (name, tags, triggers, ...)
├─ ## Tools section → Skill.tools = [SkillTool(...), ...]
│ (empty list for knowledge skills)
└─ Markdown body → Skill.body = "raw prose string"
(None for plain structured skills)
└─ SkillRegistry.register_skill(skill)
├─ _skills[name] = skill
├─ _tools[tool.name] = tool (for each tool in skill.tools)
├─ _tags_index[tag] += tool.name (tag→tool mapping)
└─ _tool_to_skill[tool] = skill.name
└─ MCPClientManager.register_server(skill.name, skill.mcp_server)
(only for skills where mcp_server is set)
└─ UserSkillPolicy.set_default_skills(skills)
└─ seeds the enabled_by_default allow-set
per turn (inside LangGraph call_model and execute_tools nodes)
└─ _get_allowed(config)
└─ UserSkillPolicy.get_allowed_skills(thread_id) ← policy lookup, no disk I/O
└─ _build_system_prompt(state, config)
├─ Tier 1: iterate Skill.description for all allowed skills (always)
├─ Tier 2a: append Skill.body for always_inject skills (always)
├─ Tier 2b: match HumanMessage against Skill.triggers,
│ append Skill.body for matches only ← lazy injection
└─ Tier 3: append long_term_memory from config["configurable"]
└─ execute_tools()
├─ skilltool_to_langchain_tool(t) for each allowed SkillTool
│ └─ StructuredTool.from_function(...) ← created fresh each turn
├─ MCPClientManager.get_tools() ← lazy MCP connection
└─ ToolNode(all_tools).ainvoke(state)
Why fresh StructuredTool wrappers each turn
A StructuredTool is a LangChain object that bundles a callable, a name, a description, and a Pydantic schema. It could in principle be created once at startup and reused. The reason it is not: the set of tools passed to ToolNode must exactly reflect the skills enabled for the current session at the current turn. If a user runs /skill disable Shell mid-session, the next turn's ToolNode must not include run_bash. Building the list fresh from the policy on every turn is the simplest way to guarantee this without any invalidation logic.
LangChain API: StructuredTool and Pydantic schema generation
LangChain's StructuredTool is the bridge between a declarative SkillTool (a name, description, and JSON Schema) and something a ToolNode can actually execute. The conversion happens in core/adapter.py:
from langchain_core.tools import StructuredTool
tool = StructuredTool.from_function(
func=_wrapped, # the callable that runs the entrypoint
name=skill_tool.name,
description=skill_tool.description,
args_schema=create_args_schema(skill_tool.schema), # Pydantic model
)
StructuredTool.from_function is the standard factory for tools whose arguments need schema validation. The three things it needs are:
func- the Python callable to invoke when the tool is called. Birdie wraps the entrypoint resolver here.description- shown to the LLM as the tool's purpose. The LLM uses this to decide when to call the tool.args_schema- a PydanticBaseModelsubclass. LangChain uses this to validate incoming arguments and to generate the JSON Schema it sends to the LLM as the function signature.
The args_schema Pydantic class is built dynamically from the JSON Schema in the SKILL.MD using Pydantic's create_model:
from pydantic import BaseModel, create_model
# schema = {"type": "object", "properties": {"path": {"type": "string"}}, "required": ["path"]}
fields = {}
for field_name, field_schema in schema["properties"].items():
python_type = _TYPE_MAP.get(field_schema.get("type", ""), Any)
if field_name in schema.get("required", []):
fields[field_name] = (python_type, ...) # required: Ellipsis as default
else:
fields[field_name] = (Optional[python_type], None) # optional: None default
DynamicModel = create_model("ToolArgs", **fields)
create_model is Pydantic's factory for building model classes at runtime without writing class definitions. The resulting class behaves exactly like a hand-written BaseModel - LangChain can call .model_json_schema() on it and validate tool call arguments against it before execution.
Agent loop
The agent is a LangGraph StateGraph with two nodes and a conditional edge.
LangGraph API: building the graph
The entire agent loop is defined using LangGraph's StateGraph builder. The full definition lives in agent/graph.py and agent/run.py:
from langgraph.graph import StateGraph, START, END
from langgraph.prebuilt import ToolNode
# 1. Declare the graph with its state type
workflow = StateGraph(AgentState)
# 2. Register node functions (sync or async callables)
workflow.add_node("agent", call_model)
workflow.add_node("tools", execute_tools)
# 3. Wire the edges
workflow.add_edge(START, "agent") # always start at "agent"
workflow.add_conditional_edges( # after "agent", decide what's next
"agent",
should_continue, # routing function
{"tools": "tools", END: END}, # return value -> destination map
)
workflow.add_edge("tools", "agent") # after tools, always go back to "agent"
# 4. Compile into a runnable with optional persistence
app = workflow.compile(checkpointer=checkpointer)
Key concepts:
StateGraph(AgentState)- the state type is declared once at construction. Every node receives the full current state and returns a dict of updates (not the full state). LangGraph merges the updates using the field reducers.add_node(name, func)- registers a function as a graph node. The function signature must be(state, config) -> dict(or the async equivalent). LangGraph calls it with the current state and run config automatically.add_conditional_edges(source, router, map)- aftersourceruns, LangGraph callsrouter(state)and uses the return value as a key intomapto determine the next node. ReturningENDfrom the router (or mapping a value toEND) terminates the graph.add_edge(a, b)- unconditional transition: afteraalways go tob.compile(checkpointer=...)- produces aCompiledGraph(also aRunnable) that can be invoked via.ainvoke(),.astream(), etc. The checkpointer is wired in here; LangGraph calls it automatically before and after each node.
START
│
▼
┌──────────────────────────────────────┐
│ agent node call_model() │
│ │
│ 1. repair dangling tool calls │
│ 2. resolve skill tools │
│ (registry + policy) │
│ 3. fetch MCP tools │
│ (MCPClientManager) │
│ 4. build system prompt │
│ (Tier 1 + Tier 2 + Tier 3) │
│ 5. call provider.achat() │
│ with all tools merged │
│ 6. append AIMessage to state │
└──────────────────┬───────────────────┘
│
▼ should_continue()
last message
has tool_calls?
│
yes │ no
│ └──► END
▼
┌──────────────────────────────────────┐
│ tools node execute_tools() │
│ │
│ 1. resolve skill tools │
│ 2. fetch MCP tools │
│ 3. build fresh LangChain ToolNode │
│ with all tools merged │
│ 4. execute called tool │
│ 5. on error: return error │
│ ToolMessage so state stays │
│ balanced and LLM can recover │
│ 6. append ToolMessage(s) to state │
└──────────────────┬───────────────────┘
│
└─────────────────► agent node (loop)
The loop continues until the model returns a message with no tool_calls. There is no hard cap on iterations - it is the model's decision to stop.
State and the add_messages reducer
AgentState is a LangGraph TypedDict:
from typing import Annotated, Sequence, TypedDict
from langchain_core.messages import BaseMessage
from langgraph.graph.message import add_messages
class AgentState(TypedDict):
messages: Annotated[Sequence[BaseMessage], add_messages]
The Annotated[..., add_messages] syntax is LangGraph's reducer pattern. When a node returns {"messages": [new_msg]}, LangGraph does not replace the messages list - it calls add_messages(current, [new_msg]) to merge the update. add_messages appends new messages and deduplicates by message ID, which means re-delivering the same message (e.g. after a retry) does not create duplicates.
Without a reducer, every node would have to return the complete messages list. With it, each node only returns the delta - new messages to append - and LangGraph handles the merge.
The state is intentionally minimal: only messages. The session ID, long-term memory, and policy key all flow through config["configurable"] rather than being stored in state, because they are per-invocation context, not persistent conversation data.
LangGraph API: RunnableConfig and per-turn context
Every node function receives a second argument, config: RunnableConfig, alongside state:
from langchain_core.runnables import RunnableConfig
async def call_model(state: AgentState, config: RunnableConfig) -> dict:
thread_id = config.get("configurable", {}).get("thread_id", "")
ltm = config.get("configurable", {}).get("long_term_memory") or []
RunnableConfig is LangChain's standard carrier for execution-time metadata. The "configurable" key is the designated slot for application-defined values that need to flow into graph nodes without being stored in state. LangGraph propagates the same config dict to every node in the graph for a given invocation.
The caller sets these values when invoking the graph:
run_config = {"configurable": {
"thread_id": "2026-04-29_1", # identifies the session / checkpoint
"long_term_memory": ["user fact 1"], # injected into system prompt, not stored
}}
await app.ainvoke(initial_state, run_config)
thread_id in config["configurable"] is the LangGraph convention for identifying which checkpoint to load and save. Every checkpointer implementation reads this key automatically - the application does not need to pass it separately to the checkpointer.
History persistence is handled by LangGraph's checkpointer keyed on config["configurable"]["thread_id"] (the session ID). When a turn starts, the checkpointer loads the full prior history automatically - the application only passes the new HumanMessage. After each node, the checkpointer writes the delta back to disk.
LangGraph API: ToolNode
ToolNode is a prebuilt LangGraph node that handles tool execution. Rather than writing the dispatch loop manually, the graph hands a list of LangChain tools to ToolNode and delegates the entire execution step to it:
from langgraph.prebuilt import ToolNode
async def execute_tools(state: AgentState, config: RunnableConfig) -> dict:
langchain_tools = (
[skilltool_to_langchain_tool(t) for t in skill_tools] + mcp_tools
)
tool_node = ToolNode(langchain_tools)
return await tool_node.ainvoke(state)
What ToolNode does internally:
- Reads
state["messages"][-1]and extracts itstool_callslist (each entry hasid,name, andargs). - Looks up each tool call by
namein the provided tools list. - Calls the matched tools (in parallel when multiple tool calls exist in a single
AIMessage). - Wraps each result in a
ToolMessagewithtool_call_idmatching the originaltool_callsentry. - Returns
{"messages": [ToolMessage, ...]}- the delta that LangGraph appends to state via theadd_messagesreducer.
The tool_call_id pairing is how the LLM knows which result belongs to which call. Providers like Mistral and Anthropic validate that every tool_calls entry in an AIMessage has exactly one matching ToolMessage before they accept the message history - this is why the checkpoint repair step is necessary.
A fresh ToolNode is created on every invocation of execute_tools rather than once at startup. This is deliberate: the set of allowed tools can change between turns as the user enables or disables skills, so the tool list must be resolved at call time.
Checkpoint repair
If a tool execution is interrupted before its ToolMessage is written (e.g. process killed, exception before ToolNode finishes), the checkpoint ends up with an AIMessage whose tool_calls have no matching ToolMessage. Providers like Mistral reject this as a protocol violation.
At the start of every call_model invocation, _repair_dangling_tool_calls scans the loaded message list, finds any unanswered tool calls, and inserts placeholder ToolMessages immediately after the offending AIMessage. The repair messages are included in the state delta returned by call_model, so they are written to the checkpoint and the conversation history heals permanently on the next save.
MCP integration
Model Context Protocol (MCP) is an open standard that lets a server expose a set of tools over a well-defined wire protocol. Instead of writing a Python function and wiring it into an entrypoint, you run a separate process that speaks MCP - the agent connects to it, discovers the available tools, and calls them exactly as it would call any other tool.
This is useful for showcasing how agents can consume external capability providers: the agent does not need to know how a tool is implemented or where it runs. It only needs to know how to reach the server.
Birdie integrates MCP through langchain-mcp-adapters, which converts MCP tool definitions into native LangChain BaseTool objects. These are merged with the skill tools before each LLM call and each ToolNode invocation, so the model sees them alongside any bash: or python: tools without any special handling.
LangChain API: MultiServerMCPClient
MCP integration uses langchain-mcp-adapters, a first-party LangChain library that converts MCP tool definitions into native LangChain BaseTool objects. The central class is MultiServerMCPClient:
from langchain_mcp_adapters.client import MultiServerMCPClient
client = MultiServerMCPClient({
"mcp_demo": {
"transport": "stdio",
"command": "python",
"args": ["birdie/skills/mcp_demo/server.py"],
}
})
# Connects to each server, calls tools/list, returns List[BaseTool]
tools: list[BaseTool] = await client.get_tools()
Each call to get_tools() opens a fresh session to each server, fetches the tool list, and closes the session. The BaseTool objects it returns are fully callable - invoking them opens another fresh session, sends a tools/call request, and returns the result. There are no persistent connections to manage.
Because BaseTool is the same interface used by StructuredTool (and everything else in LangChain's tool ecosystem), the MCP tools slot directly into ToolNode alongside the skill tools:
langchain_tools = (
[skilltool_to_langchain_tool(t) for t in skill_tools] # StructuredTool
+ mcp_tools # BaseTool from MCP
)
tool_node = ToolNode(langchain_tools) # ToolNode doesn't care which type
For the LLM's tool schema (sent in the API request), Birdie converts BaseTool objects to NormalizedToolDef dicts using lc_tool_to_normalized_def, which reads tool.args_schema. MCP tools expose args_schema as a plain JSON Schema dict (not a Pydantic class), so the conversion handles both cases:
def lc_tool_to_normalized_def(tool: BaseTool) -> NormalizedToolDef:
args_schema = tool.args_schema
if isinstance(args_schema, dict):
schema = dict(args_schema) # MCP: already a JSON Schema dict
else:
schema = args_schema.model_json_schema() # StructuredTool: Pydantic model
return {"name": tool.name, "description": tool.description, "parameters": schema}
How it works end to end
Startup
└─ loader discovers SKILL.MD with mcp_server: frontmatter key
└─ MCPClientManager.register_server(name, MCPServerConfig)
First tool call (lazy connection)
└─ MCPClientManager.get_tools()
└─ MultiServerMCPClient.get_tools()
└─ spawns server process (stdio) or connects (SSE/HTTP)
└─ calls tools/list → gets tool names + schemas
└─ returns List[BaseTool] (cached for process lifetime)
Every call_model() invocation
└─ skill tools (SkillTool objects from registry) → NormalizedToolDef list
└─ MCP tools (BaseTool objects from manager) → NormalizedToolDef list
└─ merged list sent to provider.achat() so LLM sees all tools
Every execute_tools() invocation
└─ skill tools → LangChain StructuredTool list
└─ MCP tools → BaseTool list (already LangChain-compatible)
└─ merged list passed to ToolNode for execution
The MCP client opens a fresh session for each tool invocation. This is the design pattern recommended by langchain-mcp-adapters - it keeps the client stateless and avoids managing long-lived connections.
Declaring an MCP server in SKILL.MD
Add an mcp_server block to the frontmatter. No ## Tools section is needed - the tools are discovered dynamically from the server at runtime.
stdio transport (server runs as a subprocess):
---
name: my_tools
version: 1.0.0
description: Tools provided by my MCP server
enabled_by_default: false
mcp_server:
transport: stdio
command: python
args: ["path/to/server.py"]
---
SSE / HTTP transport (server is a running process you connect to):
---
name: remote_tools
version: 1.0.0
description: Tools from a remote MCP server
enabled_by_default: false
mcp_server:
transport: sse
url: http://localhost:8080/sse
---
mcp_server fields
| Field | Required | Description |
|---|---|---|
transport |
yes | stdio or sse |
command |
stdio only | Executable to launch (e.g. python, node) |
args |
stdio only | List of arguments passed to the command |
env |
no | Extra environment variables for the subprocess |
cwd |
no | Working directory for the subprocess |
url |
sse only | URL of the SSE endpoint |
headers |
sse only | HTTP headers to send with the connection |
Writing an MCP server
An MCP server can be written in any language that has an MCP SDK. The Python SDK makes it very compact using FastMCP:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my_server")
@mcp.tool()
def add(a: int, b: int) -> int:
"""Add two numbers."""
return a + b
if __name__ == "__main__":
mcp.run(transport="stdio")
The function's name, docstring, and type annotations become the tool name, description, and argument schema automatically. The LLM sees exactly what is reflected from the function signature.
The demo server
birdie/skills/mcp_demo/ contains a minimal working example:
# birdie/skills/mcp_demo/server.py
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("mcp_demo")
@mcp.tool()
def echo(message: str) -> str:
"""Return the message unchanged."""
return message
@mcp.tool()
def reverse_string(text: str) -> str:
"""Return the text with characters in reverse order."""
return text[::-1]
if __name__ == "__main__":
mcp.run(transport="stdio")
The matching SKILL.MD points the agent at this server:
---
name: mcp_demo
version: 1.0.0
description: Demo tools served via MCP (echo and reverse_string)
enabled_by_default: false
mcp_server:
transport: stdio
command: python
args: ["birdie/skills/mcp_demo/server.py"]
---
To try it, enable the skill for your session and call one of the tools:
you> /skill enable mcp_demo
you> reverse "hello world"
The agent will call reverse_string via MCP and show you dlrow olleh.
Install the MCP extra
MCP support is an optional dependency. Install it alongside Birdie:
pip install -e ".[mcp]"
This adds mcp (the official Python SDK) and langchain-mcp-adapters. If mcp_server is declared in a SKILL.MD but the extra is not installed, MCPClientManager.get_tools() raises an ImportError with a clear message on first use.
Where MCP fits in the architecture
The entrypoint resolver (entrypoints.py) handles synchronous, stateless execution: it receives a string like bash:cat {path} and returns a callable. MCP does not fit this model because it requires an async connection lifecycle and returns pre-built LangChain tool objects rather than raw callables.
Instead, MCP tools are handled by MCPClientManager (core/mcp_client.py) and merged directly into the graph's tool pools at the point where they are used:
core/mcp_client.py MCPClientManager
register_server() ← called by DynamicAgent at startup
get_tools() ← called by graph.py on every turn
agent/graph.py call_model()
skill tools (from registry) → NormalizedToolDef
MCP tools (from manager) → NormalizedToolDef
all merged → provider.achat()
execute_tools()
skill tools → LangChain StructuredTool
MCP tools → LangChain BaseTool (already usable)
all merged → ToolNode
This keeps MCPClientManager as a thin, focused module and avoids coupling the entrypoint system to async connection management.
Every call to call_model constructs a fresh system prompt from the in-memory skill objects. Nothing is read from disk.
Tier 1 - skill catalog (always present)
A compact bullet list of every skill currently allowed for the user/session:
You have access to the following skills:
- **Filesystem**: Local file operations using shell commands.
- **ssh**: Establish and manage SSH connections... triggers: ssh, remote server, ...
This is sent on every turn regardless of what the user said. It is intentionally small - only name, description, and triggers from the frontmatter, not the skill body.
Tier 2 - freetext skill body (on trigger only)
When the most recent HumanMessage contains any of a freetext skill's trigger keywords (case-insensitive substring match), the skill's full Markdown body is appended:
--- ssh skill context ---
# SSH Skill
This skill provides capabilities for establishing and managing SSH connections...
[full body]
Tier 3 - long-term memory (on every turn)
Notes stored via /remember are injected as a third tier at the end of the system prompt:
--- Long-term memory ---
- prefers concise answers without bullet points
- working on a Python project called Birdie
These are stored in the user-scoped memory.json and survive both restarts and session switches. They are always present once added, regardless of what the user says.
LangChain API: the message types
The entire conversation history is a list of BaseMessage subclasses from langchain_core.messages. These are the same types used across all LangChain integrations and are what every provider converts to and from:
from langchain_core.messages import (
HumanMessage, # role: "user" - the human's input
AIMessage, # role: "assistant" - the model's reply or tool call request
ToolMessage, # role: "tool" - the result of a tool invocation
SystemMessage, # role: "system" - injected instructions (Birdie builds this each turn)
AIMessageChunk, # streaming delta - accumulated into AIMessage for history
)
Each message type maps directly to a role in the provider's wire format. Birdie's _lc_to_openai_messages and equivalent functions in each provider translate the LangChain message list to the vendor-specific JSON format before the API call.
AIMessage carries tool_calls when the model requests a tool. This is a list of dicts with id, name, and args:
AIMessage(
content="",
tool_calls=[
{"id": "call_abc123", "name": "run_bash", "args": {"command": "ls -la"}}
]
)
ToolMessage closes the loop by referencing that same id:
ToolMessage(
content="main.py\nREADME.md",
tool_call_id="call_abc123",
name="run_bash",
)
This paired id relationship is enforced by every provider. ToolNode creates ToolMessage objects with the correct tool_call_id automatically. The checkpoint repair code fills in placeholder ToolMessages when the process was interrupted before ToolNode could do so.
Full context structure sent to the LLM
[system prompt]
Tier 1: skill catalog
Tier 2: freetext skill body (if triggered)
Tier 3: long-term memory notes (if any)
[message history - rolling window, last 20 messages from checkpointer]
HumanMessage (turn N-4)
AIMessage (tool_calls)
ToolMessage (tool result)
AIMessage (text response)
HumanMessage (turn N)
...
HumanMessage (current turn)
The full history is stored in the checkpointer's SQLite database; the last 20 messages are forwarded to the LLM each turn. Tool results longer than 32,000 characters are truncated with a count of dropped bytes to keep payloads within model limits.
The provider layer converts this LangChain message list into the wire format expected by each vendor (OpenAI, Mistral, Anthropic each have different conventions for tool messages and system prompts).
Access control
UserSkillPolicy enforces which skills a user may access. Resolution order (highest priority first):
- Explicit disable - always blocks the skill for that key
- Explicit enable - grants the skill for that key
- Global defaults - skills with
enabled_by_default: true
In the CLI, the session ID is used as the policy key (not the filesystem --user value). This means each session has fully independent skill grants: /skill enable and /skill disable affect only the current session and are persisted to the session JSON, so they are restored on resume.
# CLI uses session.id as the key
agent.enable_skill_for_user(session.id, "Filesystem")
agent.disable_skill_for_user(session.id, "Weather")
The policy is consulted on every call_model and execute_tools invocation so changes take effect immediately on the next turn without restarting the agent.
LangGraph API: invoking the graph and streaming
DynamicAgent exposes two invocation methods that map directly to LangGraph's compiled graph API:
# Run to completion - returns final AgentState
result = await agent.invoke("list my files", thread_id="session-1")
final_messages = result["messages"]
# Stream node-level updates - yields one dict per node execution
async for update in agent.astream("list my files", thread_id="session-1"):
# update is e.g. {"agent": {"messages": [AIMessage(...)]}}
# or {"tools": {"messages": [ToolMessage(...)]}}
node_name = list(update.keys())[0]
new_messages = update[node_name]["messages"]
Under the hood, astream calls app.astream(initial_state, run_config, stream_mode="updates").
stream_mode controls the granularity of what is yielded:
| Mode | Yields | Use case |
|---|---|---|
"updates" |
Dict of {node_name: state_delta} after each node |
CLI display - know which node ran and what it produced |
"values" |
Full AgentState after each node |
Inspection - always see the complete current state |
"messages" |
Individual LLM token chunks during streaming | Token-level streaming to the user |
Birdie uses "updates" so the CLI can display tool call names and results as they happen, without waiting for the full turn to complete.
The initial_state passed to astream or ainvoke contains only the new HumanMessage. LangGraph loads the existing thread history from the checkpointer and appends to it - the application never manages the full history:
initial_state = {"messages": [HumanMessage(content=message)]}
run_config = {"configurable": {"thread_id": thread_id, "long_term_memory": ltm}}
async for update in app.astream(initial_state, run_config, stream_mode="updates"):
...
Providers
Birdie wraps each vendor SDK behind a common LLMProvider interface:
class LLMProvider(ABC):
def achat(messages, tools, system_prompt, ...) -> AIMessage: ...
def supports_tools() -> bool: ...
def vendor_name -> str: ...
def model_name -> str: ...
| Vendor | Class | Install |
|---|---|---|
| OpenAI | OpenAIProvider |
pip install openai |
| Azure OpenAI | AzureOpenAIProvider |
no extra dep - set AZURE_OPENAI_API_KEY + AZURE_OPENAI_ENDPOINT |
| Anthropic | AnthropicProvider |
pip install anthropic |
| Mistral | MistralProvider |
pip install mistralai>=1.0 |
| Google Gemini | GeminiProvider |
no extra dep - set GEMINI_API_KEY |
| Ollama | OllamaProvider |
local server required |
| Any LangChain model | LangChainProvider |
existing BaseChatModel |
| ACP agent | ACPProvider |
ACP adapter running locally (e.g. claude-agent-acp) |
Returned AIMessage objects carry usage_metadata with input_tokens, output_tokens, and total_tokens from the API response. The CLI status bar reads these to display live context and spend counters.
LangChain API: LangChainProvider and bind_tools
For callers that already have a LangChain BaseChatModel (e.g. ChatOpenAI, ChatAnthropic), LangChainProvider wraps it without any native SDK dependency:
from langchain_openai import ChatOpenAI
agent = DynamicAgent(ChatOpenAI(model="gpt-4o"), skills_dir="birdie/skills")
Internally, LangChainProvider uses two key LangChain patterns:
bind_tools - attaches a tool schema to the model so the LLM knows what tools are available. It returns a new runnable (the original model is not mutated):
def _with_tools(self, tools: list[NormalizedToolDef] | None):
if not tools:
return self._llm
lc_tools = [_normalized_tool_to_lc_schema(t) for t in tools]
return self._llm.bind_tools(lc_tools) # returns Runnable, not BaseChatModel
bind_tools is the LangChain standard for attaching tool definitions to any chat model. The tools are passed in the API request as the tools or functions parameter (depending on the provider). When the model decides to call a tool, the response comes back as an AIMessage with a populated tool_calls field.
ainvoke and astream - the standard Runnable interface:
async def achat(self, messages, tools, system_prompt, ...) -> BaseMessage:
msgs = self._inject_system(messages, system_prompt)
return await self._with_tools(tools).ainvoke(msgs)
async def astream_chat(self, messages, tools, system_prompt, ...):
msgs = self._inject_system(messages, system_prompt)
async for chunk in self._with_tools(tools).astream(msgs):
yield chunk # yields AIMessageChunk objects
ainvoke returns a complete AIMessage. astream yields AIMessageChunk objects - partial tokens that accumulate into the full message. Both are part of LangChain's Runnable interface, implemented by every BaseChatModel.
The native providers (e.g. MistralProvider, AnthropicProvider) call their vendor SDKs directly and convert to/from LangChain message types manually, rather than going through BaseChatModel. This gives finer control over vendor-specific features (e.g. Anthropic's native system prompt parameter, Mistral's tool_call content handling) while still producing the same AIMessage output the graph expects.
CLI
birdie [--user USER_ID] [--session-id SESSION_ID] [--skills-dir PATH] [--config FILE]
| Flag | Description |
|---|---|
--user USER_ID |
Filesystem namespace for sessions (defaults to $USER) |
--session-id SESSION_ID |
Resume a specific session (e.g. 2026-04-28_1) |
--skills-dir PATH |
Override the built-in skills directory. ~/.birdie/skills/ is always also loaded on top. |
--config FILE |
Path to a JSON provider config file |
Provider configuration
By default Birdie reads LLM_VENDOR, LLM_MODEL, and the vendor API key from environment variables. The --config flag lets you store all of that in a file instead:
birdie --config ~/.birdie/anthropic.json
Example config files
Anthropic:
{
"vendor": "anthropic",
"model": "claude-sonnet-4-6",
"api_key": "sk-ant-...",
"temperature": 0.3
}
OpenAI:
{
"vendor": "openai",
"model": "gpt-4o",
"api_key": "sk-...",
"max_tokens": 4096
}
Azure OpenAI:
{
"vendor": "azure",
"model": "my-gpt4o-deployment",
"api_key": "...",
"base_url": "https://my-resource.openai.azure.com/",
"api_version": "2024-02-01"
}
Google Gemini:
{
"vendor": "gemini",
"model": "gemini-2.5-pro",
"api_key": "AIza..."
}
Mistral:
{
"vendor": "mistral",
"model": "mistral-large-latest",
"api_key": "..."
}
Ollama (local, no key needed):
{
"vendor": "ollama",
"model": "llama3",
"base_url": "http://localhost:11434/v1"
}
ACP agent (e.g. Claude Code via claude-agent-acp):
{
"vendor": "acp",
"base_url": "http://localhost:8765",
"model": "default"
}
The model field selects the named agent on the ACP server (defaults to "default"). The underlying LLM model is configured on the adapter side. Birdie tools are not passed through - the inner agent runs its own tool loop.
Config fields
| Field | Type | Default | Description |
|---|---|---|---|
vendor |
string | openai |
openai | azure | anthropic | mistral | gemini | ollama | langchain | acp |
model |
string | provider default | Model identifier |
api_key |
string | from env var | API key (omit to use env var) |
base_url |
string | - | Override API endpoint (proxy, local server) |
temperature |
float | 0.0 |
Sampling temperature (0.0 – 2.0) |
max_tokens |
int | - | Max completion tokens |
api_version |
string | 2024-02-01 |
Azure OpenAI API version |
timeout |
float | 120.0 |
Request timeout in seconds (Mistral) |
The LLM_PROVIDER_CONFIG environment variable accepts an inline JSON string and takes precedence over everything else:
export LLM_PROVIDER_CONFIG='{"vendor":"anthropic","model":"claude-sonnet-4-6","api_key":"sk-ant-..."}'
birdie
Key bindings
| Key | Action |
|---|---|
Enter |
Submit message |
Ctrl+J |
Insert newline (multi-line message) |
Ctrl+C |
Quit |
Slash commands
| Command | Description |
|---|---|
/help |
Show available commands |
/remember <text> |
Save a note to long-term memory |
/info |
Show user, session ID, turn count, and provider |
/tool list |
List callable tools for the current session |
/tool output full |
Show complete tool output |
/tool output short |
Show first 1000 characters and remaining count (default) |
/tool output off |
Show only tool name and line count, no content |
/skill list |
List all loaded skills with enabled/disabled status |
/skill enable <name> |
Enable a skill (persisted to session). Suggests closest match if name not found. |
/skill disable <name> |
Disable a skill (persisted to session). Suggests closest match if name not found. |
/session new |
Create a new session and switch to it |
/session switch <id> |
Resume an existing session |
/session delete <id> |
Delete a session (creates a new one if current) |
/session list |
List all sessions for this user |
/session info |
Show session metadata (created, turns, memory) |
/new |
Alias for /session new |
/clear |
Clear the screen |
/quit |
Exit |
Memory and sessions
How agents use memory
An LLM has no persistent state of its own - every API call is stateless. Building a useful assistant means giving it two distinct kinds of memory:
Short-term memory is the message history forwarded with each request. It contains the conversation so far: the user's messages, the model's replies, tool calls, and tool results. The model uses it to maintain continuity, reference earlier messages, and understand what tools it has already called in this turn. Short-term memory is bounded by the model's context window and must be actively managed for cost.
Long-term memory is facts that should survive across conversations and restarts. Rather than accumulating in the message list, they are stored in a separate layer and injected into the system prompt on every request. The model sees them as persistent background knowledge. Long-term memory requires an explicit write operation - it does not grow automatically from conversation content.
Memory in the agentic loop
Each call to call_model assembles everything the LLM will see:
┌──────────────────────────────────────────────────────────────────────┐
│ What the LLM receives on every call_model() invocation │
│ │
│ system prompt (rebuilt from in-memory skill objects, no disk I/O) │
│ Tier 1 skill catalog - what tools are available this turn │
│ Tier 2 knowledge context - freetext skill body (if triggered) │
│ Tier 3 long-term memory - user facts (always present) │
│ │
│ message window (last 20 from the full checkpointed history) │
│ HumanMessage "list files in current dir" │
│ AIMessage → calling list_files(path=".") │
│ ToolMessage "main.py\nREADME.md\n..." │
│ AIMessage "Here are the files: ..." │
│ HumanMessage "which is the largest?" ← current turn │
└──────────────────────────────────────────────────────────────────────┘
Short-term memory is the message window. Long-term memory is the Tier 3 block injected into the system prompt. Both are assembled per-turn at call time from their respective stores.
The agent loop (START → agent → tools → agent → … → END) can execute multiple call_model invocations per user turn - once for each iteration around the tool loop. Each one receives the same system prompt and the same growing message window, with each completed tool call appended as a new ToolMessage.
Birdie's implementation
Short-term memory - LangGraph's checkpointer
Birdie delegates all message persistence to LangGraph's checkpointer. After every graph node, the checkpointer writes the updated AgentState to a per-user SQLite database:
~/.birdie/sessions/<user_id>/checkpoints.db
LangGraph API: checkpointer setup
LangGraph ships two checkpointer implementations out of the box:
from langgraph.checkpoint.memory import MemorySaver # in-process, no persistence
from langgraph.checkpoint.sqlite.aio import AsyncSqliteSaver # SQLite, survives restarts
# In-memory - suitable for tests and one-shot scripts
app = workflow.compile(checkpointer=MemorySaver())
# SQLite - Birdie's production default
async with AsyncSqliteSaver.from_conn_string(db_path) as checkpointer:
app = workflow.compile(checkpointer=checkpointer)
The checkpointer is passed to compile(). From that point, LangGraph handles all reads and writes automatically:
- Before the first node: the checkpointer loads the latest snapshot for the current
thread_id. The graph sees the restoredAgentStateas if nothing had happened. - After each node: the checkpointer saves the state delta (not the full state). Deltas are merged on read, so the storage is append-only and efficient.
- No application code needed: the graph never calls the checkpointer directly. LangGraph's runtime wires it in.
Each session is a LangGraph thread, identified by the session ID (2026-04-29_1). The thread concept is LangGraph's unit of isolated history: two different thread_id values produce two completely independent conversation histories stored in the same database.
When a turn starts, the checkpointer loads the full prior history for that thread automatically - the application only passes the new HumanMessage. The graph receives a complete, accurately-typed message list without any application-level serialization or deserialization.
The context window trim happens inside call_model, not at the storage layer:
# graph.py - inside call_model()
all_messages = list(state["messages"]) # full history from checkpointer
context_msgs = all_messages[-MAX_CONTEXT_MESSAGES:] # last 20 forwarded to LLM
The checkpointer retains unlimited history; the 20-message cap controls only what is sent to the provider per turn.
Checkpoint repair. If the process dies after the LLM responds (its AIMessage is already written to the checkpoint) but before tool execution completes (no ToolMessage follows), the checkpoint is left in a state that providers reject as a protocol violation. On the next invocation, _repair_dangling_tool_calls detects orphaned tool calls within the context window, synthesises placeholder ToolMessages, and returns them alongside the real response so the checkpoint heals permanently on write.
Long-term memory - user-scoped memory.json
Notes added via /remember are written to a JSON file alongside the session files:
~/.birdie/sessions/<user_id>/memory.json
This store is user-scoped, not session-scoped. A fact added during session 2026-04-29_1 is still present when you run /session new or resume 2026-04-29_2 the next day. The file is a flat list of timestamped entries:
{
"user_id": "alice",
"entries": [
{"id": "a3f1c2b0", "timestamp": "2026-04-29T09:12:00+00:00", "content": "prefers concise answers"},
{"id": "b7e4d9f1", "timestamp": "2026-04-29T10:00:00+00:00", "content": "working on project Birdie"}
]
}
At the start of each turn the CLI reads memory.json and forwards the contents as long_term_memory through config["configurable"]. The graph reads it from config - not from state - so LTM is never written into the checkpoint and never mixed with message history:
# graph.py - inside _build_system_prompt()
ltm = config.get("configurable", {}).get("long_term_memory") or []
if ltm:
system += "\n\n--- Long-term memory ---\n"
system += "\n".join(f"- {entry}" for entry in ltm)
Birdie uses explicit-only long-term memory: nothing is extracted or summarised from the conversation automatically. Only /remember writes to the store, so nothing is recorded without your knowledge.
Session files - lightweight metadata
The session JSON files store only what neither the checkpointer nor the memory file can represent: skill grants and administrative metadata. They are small and fully human-readable:
{
"id": "2026-04-29_1",
"user_id": "alice",
"created_at": "2026-04-29T08:00:00+00:00",
"updated_at": "2026-04-29T11:30:00+00:00",
"turns": 12,
"enabled_skills": ["Shell", "Filesystem"],
"disabled_skills": ["Weather"]
}
There is no messages field - history lives in checkpoints.db. There is no memory field - facts live in memory.json. The session file is the glue that ties a human-readable ID to these two backing stores.
File layout
~/.birdie/sessions/
alice/
checkpoints.db ← LangGraph SqliteSaver (all sessions as LG threads)
memory.json ← user-scoped long-term memory
2026-04-29_1.json ← session metadata (skill grants, turn count)
2026-04-29_2.json
bob/
checkpoints.db
memory.json
2026-04-29_1.json
The session ID as a shared key
The session ID (2026-04-29_1) plays three roles simultaneously:
| Role | Where used | Effect |
|---|---|---|
LangGraph thread_id |
SqliteSaver checkpointer |
Loads and saves this session's message history |
| Skill policy key | UserSkillPolicy |
Determines which skills are active this turn |
| Filename | <session_id>.json |
Links to the metadata JSON on disk |
Switching sessions changes all three at once - history, skill grants, and metadata - by changing one string.
Session ID format and lifecycle
Session IDs use YYYY-MM-DD_N where N increments from 1 for each new session on that calendar day:
2026-04-29_1 ← first session on 29 April
2026-04-29_2 ← second session that day
2026-04-30_1 ← first session on 30 April
# Start a new session (default on first run)
birdie --user alice
# Resume a specific session
birdie --user alice --session-id 2026-04-29_1
Using long-term memory
you> /remember prefers concise answers without bullet points
you> /remember working on a Python project called Birdie
After these commands, every subsequent turn in every session includes:
--- Long-term memory ---
- prefers concise answers without bullet points
- working on a Python project called Birdie
in the system prompt, for the lifetime of the user (until explicitly removed from memory.json).
Status bar (bottom of terminal)
anthropic · claude-sonnet-4-6 │ session: 2026-04-29_1 │ ctx: 1,234 tok │ spent: ↑5,678 ↓1,234 tok
session- the active session IDctx- input tokens in the most recent API call↑/↓- cumulative input / output tokens this process run
Running tests
pip install -e ".[dev,mcp]"
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file birdie_agent-0.2.3.tar.gz.
File metadata
- Download URL: birdie_agent-0.2.3.tar.gz
- Upload date:
- Size: 122.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a050df3256c96463b97800d071d210cc2a9127fff0e652c175e55ff65bd48ca4
|
|
| MD5 |
06a884feb71e4d22716fb9b385d814d0
|
|
| BLAKE2b-256 |
6dc6cf4c049c4ffeb360c5622bbb32cbf3b36f939bb1dadb8a17d3ddffa7ad84
|
Provenance
The following attestation bundles were made for birdie_agent-0.2.3.tar.gz:
Publisher:
publish.yml on gkvas/birdie
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
birdie_agent-0.2.3.tar.gz -
Subject digest:
a050df3256c96463b97800d071d210cc2a9127fff0e652c175e55ff65bd48ca4 - Sigstore transparency entry: 1437753549
- Sigstore integration time:
-
Permalink:
gkvas/birdie@3d86befef7e8709b2e3558fbe9898e043fe2234a -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/gkvas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d86befef7e8709b2e3558fbe9898e043fe2234a -
Trigger Event:
push
-
Statement type:
File details
Details for the file birdie_agent-0.2.3-py3-none-any.whl.
File metadata
- Download URL: birdie_agent-0.2.3-py3-none-any.whl
- Upload date:
- Size: 78.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44ecd9a5d5a519508ad2131adb5f53035436adccddac90a41abb88cfa0814b65
|
|
| MD5 |
b9c1a06e338ac879f8b1f6fef82758b8
|
|
| BLAKE2b-256 |
71a08339aaca481c6af0a812879bcb5b704456cd839197340a0a3bebac1079b8
|
Provenance
The following attestation bundles were made for birdie_agent-0.2.3-py3-none-any.whl:
Publisher:
publish.yml on gkvas/birdie
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
birdie_agent-0.2.3-py3-none-any.whl -
Subject digest:
44ecd9a5d5a519508ad2131adb5f53035436adccddac90a41abb88cfa0814b65 - Sigstore transparency entry: 1437753551
- Sigstore integration time:
-
Permalink:
gkvas/birdie@3d86befef7e8709b2e3558fbe9898e043fe2234a -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/gkvas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@3d86befef7e8709b2e3558fbe9898e043fe2234a -
Trigger Event:
push
-
Statement type: