An AI coding agent for the terminal. Built to study how coding models fail.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

adipras407

These details have not been verified by PyPI

Project description

Codesm

An AI coding agent for the terminal. Built to study how coding models fail.

[!TIP]

Talks to Anthropic, OpenAI, OpenRouter, and local Ollama. Ships with 30 built in tools, speaks Model Context Protocol, runs parallel and pipelined subagents, integrates with Language Server Protocol for real code intelligence, compacts its own context, and logs every permission decision to an audit trail.
Built to answer one question: where exactly do coding models break down when you try to use them as real engineers?

Follow @Aditya-PS-05 on GitHub for more projects. Hacking on AI coding agents, agent infrastructure, and model evaluation tooling.

Run uv pip install -e . and launch codesm. You get a fully instrumented coding agent that logs every failure mode, not just the successes.

Codesm TUI

Codesm is deliberately verbose about what it is doing. Every tool call, permission prompt, compaction event, and subagent spawn shows up in the TUI tree, because you cannot build an eval for a failure mode you cannot see.

Overview

Codesm is a terminal first AI coding agent written in Python. It speaks to multiple providers (Anthropic Claude, OpenAI, OpenRouter routed models, and local Ollama), exposes a wide tool surface, and runs a ReAct loop that can fan out into parallel subagents or chain them into pipelines.

It is not trying to be the fastest or the most polished coding agent in the world. Tools like Claude Code, Cursor, Windsurf, Amp, and Aider already exist and are excellent. Codesm exists for a different reason.

Most coding agent failures happen in places you cannot see. Context windows silently overflow. Tool calls arrive out of order. Permission systems get bypassed. Subagents hallucinate tool names that do not exist in the registry. Providers disagree about edge case tool schemas. When you only use a closed source agent, you learn what works. You do not learn what does not.

I built Codesm to make every one of those failure modes visible, loggable, and reproducible. Every tool call is auditable. Every compaction is logged with token counts. Every permission denial becomes a structured event. Every subagent spawn is tree rendered in the TUI. The goal is not to hide the complexity of agent execution. The goal is to surface it.

This makes Codesm useful in three ways: as a real coding agent for day to day work, as a testbed for trying new orchestration patterns, and as a rig for studying how different models fail at the same task.

Why "Codesm"?

The name is code plus the same "ism" suffix you see in aphorism, mechanism, organism. It implies a system, a set of habits, a way of doing a thing. Codesm is the code writing system I built to figure out my own habits around working with coding models, and where those habits diverge from what the models actually do.

(It also reads nicely as "code ism": a philosophy, not a tool. That is on purpose.)

Overview
- Why "Codesm"?
Features
Failure Modes Observed
Installation
Usage
Configuration
- Environment Variables
MCP Integration
How It Works
Architecture
Development
- Prerequisites
- How to Run
Supported Platforms
CLI Reference
Contributing
Acknowledgments
License

Features

Many providers. Anthropic Claude, OpenAI, OpenRouter routed models, and local Ollama. Same ReAct loop, four backends. Route different subagents to different models based on task (Sonnet for coding, Flash for search, o1 for deep reasoning).
ReAct loop. Canonical reason then act agent loop with streaming, automatic iteration limits, and per iteration context budget checks. Implemented in codesm/agent/loop.py.
Thirty built in tools. bash, read, write, edit, multiedit, patch, grep, glob, ls, codesearch (embedding based), lsp (symbols, diagnostics, references), git, websearch, webfetch, oracle (deep reasoning), refactor, testgen, bug_localize, code_review, mermaid, and more. All registered through a central tool/registry.py.
MCP server integration. Speaks Model Context Protocol natively. Load external tools from any MCP server (mcp-servers.json), or expose Codesm's own tools over MCP to other agents. Full client, codegen, and sandbox implementation in codesm/mcp/.
Parallel subagents. parallel_tasks tool runs up to ten subagents concurrently via asyncio.gather, with auto routing, fail fast, and per task timing. Built for embarrassingly parallel work (find all API endpoints AND analyze auth flow AND locate tests).
Pipeline subagents. pipeline tool chains subagents sequentially, passing each step's output to the next. Up to five stages. Built for compositional tasks where later stages depend on earlier ones.
Staged orchestration. orchestrate tool: sequential stages, parallel tasks within each stage. The natural shape for "research then plan then implement then test" workflows.
Context compaction. ContextManager estimates tokens, triggers compaction at a configurable ratio of the max, and summarizes older messages via an LLM while preserving recent turns. Wired directly into the ReAct loop so compaction happens mid conversation, not just at session boundaries.
LSP backed code intelligence. Real Language Server Protocol integration (codesm/lsp/) for symbol lookup, diagnostics, hover, and references. Gives the agent ground truth about types and symbols instead of making it guess from context.
Embedding code search. codesearch tool uses sentence transformers for semantic code retrieval, not just string match. Handy when the agent has no idea what file to read next.
Permission system. Structured permission requests for file writes, edits, and shell commands via codesm/permission/. Every grant and deny goes to an append only audit log.
Audit log. codesm/audit/ records file operations, bash executions, permission decisions, and tool call traces. Designed so you can replay a session and reconstruct exactly what the agent did.
Session management. Each run is a session: title, topics, summary, message history, event stream. Sessions persist, so you can resume a conversation or inspect a past run.
Textual TUI. Collapsible tool call tree, streaming text, thinking display, oracle and subagent widgets, inline diffs for file edits, command palette, slash commands. Built on Textual.
Skills system. Skill suggestions aware of file context. The agent gets different prompts depending on whether it is editing Python, Rust, TypeScript, or SQL. Implemented in codesm/skills/.
Multiple memory layers. Session memory, project memory (CLAUDE.md and AGENTS.md style files), and topic indexed rolling summaries.

Failure Modes Observed

Why this section exists: Codesm is instrumented to surface failure modes that most agents hide. These are real things I hit while building and using it. Each one has the shape of a future benchmark or eval.

Silent context overflow. ReAct loops blow up context fast. Every tool call appends a tool_use and a tool_result block. By iteration twenty of a real coding task, you are often past the model's useful attention window even if you are still under its hard limit. Codesm's ContextManager monitors token estimates and triggers an LLM based compaction before the loop stalls. The summarizer is provider agnostic. See session/summarize.py for the three paths (_summarize_with_anthropic, _summarize_with_openai, _summarize_with_openrouter).
Out of order tool call streaming. Different providers interleave text and tool_use blocks differently when streaming. Claude emits text and tool_use in the order they were generated; OpenAI's Chat Completions stream has a different timing. A naive TUI that renders chunks as they arrive will show tool calls above the reasoning that justifies them. Fixed in commit f024ac2 (fix(tui): display text and tool calls in sequential order) by buffering chunks until a message block is complete, then rendering in emission order.
Tool name hallucination. Models occasionally emit a tool_use block with a tool name that does not exist in the registry, often a near miss like read_file instead of read, or a tool from a previous conversation that no longer exists. Codesm's registry returns a recoverable error ("Unknown tool: <name>. Available: ...") instead of crashing, so the model can self correct on the next turn. This turns a hard failure into a measurable self correction signal.
Permission bypass via composition. Giving an agent bash is effectively giving it everything. It can rm, curl, run python -c, or dump secrets with cat ~/.ssh/id_rsa. Codesm's permission system wraps bash, write, and edit through a Permission gate with configurable allow lists and an audit trail. The interesting failure mode is composition: a model denied rm will sometimes try to accomplish the same thing via bash -c "python -c \"import os; os.remove(...)\"". Logging denials with the full command string makes these attempts visible and evaluable.
Orchestration mode mismatch. Given a multi step task, models default to sequential execution even when subtasks are independent. Asking the same model the same question with parallel_tasks vs a plain prompt produces dramatically different wall clock times and token usage. This is an evaluation axis in its own right: how well does the model choose between parallel_tasks, pipeline, and plain sequential tool calls? Codesm exposes all three primitives so you can measure it.
Subagent result reintegration. When a parallel subagent returns a long result, the parent agent's next turn sometimes ignores it or summarizes it incorrectly. This is a context attention failure, not a capability failure. Codesm logs each subagent's full output to the event stream so you can diff what was produced against what was used.

Each of these is a real, reproducible phenomenon, not a theoretical concern. They are the raw material for the kind of eval suites coding model teams build.

Installation

Quick Start

# Clone and install with uv (recommended)
git clone https://github.com/Aditya-PS-05/codesm
cd codesm
uv pip install -e .

# Launch the TUI
codesm

# Or run directly with uv
uv run codesm

That is it. Set ANTHROPIC_API_KEY (or OPENAI_API_KEY, or point at a local Ollama) and start typing.

PyPI release: A proper PyPI package (pip install codesm) is on the roadmap. For now, install from source. It is a single uv pip install -e . away.

Prerequisites

Python 3.12 or newer
uv (recommended) or pip for dependency management
At least one LLM provider configured:
- Anthropic (default): ANTHROPIC_API_KEY, docs
- OpenAI: OPENAI_API_KEY, docs
- OpenRouter: OPENROUTER_API_KEY, docs
- Ollama (local): install Ollama, pull a model (ollama pull llama3.1), then point Codesm at it
Optional LSP servers for richer code intelligence: pylsp or pyright (Python), rust-analyzer (Rust), typescript-language-server (TS and JS)

From Source

# Clone and install in editable mode
git clone https://github.com/Aditya-PS-05/codesm
cd codesm
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Launch
codesm

Usage

Basic Commands

# Launch the interactive TUI (default)
codesm

# Point at a specific provider and model
codesm --provider anthropic --model claude-sonnet-4-5
codesm --provider openai --model gpt-4o
codesm --provider ollama --model llama3.1

# Resume a previous session
codesm --resume <SESSION_ID>

# Run a one shot task without the TUI (scriptable)
codesm run "Add a docstring to the hello() function in /tmp/test.py"

Inside the TUI, slash commands control the session:

/help           Show all slash commands
/provider       Switch LLM provider mid session
/model          Switch model
/compact        Manually trigger context compaction
/tools          List available tools
/sessions       Browse past sessions
/clear          Clear the current conversation
/quit           Exit

Providers

Codesm supports four provider backends, each routed through a common interface in codesm/provider/. Different subagents can use different providers. A search subagent might use fast and cheap Gemini Flash while a reasoning subagent uses o1.

# Anthropic Claude (default)
export ANTHROPIC_API_KEY="sk-ant-..."
codesm --provider anthropic --model claude-sonnet-4-5

# OpenAI
export OPENAI_API_KEY="sk-..."
codesm --provider openai --model gpt-4o

# OpenRouter (routes to any model)
export OPENROUTER_API_KEY="sk-or-..."
codesm --provider openrouter --model anthropic/claude-3.5-sonnet

# Ollama (local, no API key needed)
ollama serve
ollama pull llama3.1
codesm --provider ollama --model llama3.1

Per subagent provider routing is configured in ~/.config/codesm/config.toml:

[providers.default]
provider = "anthropic"
model = "claude-sonnet-4-5"

[providers.finder]
provider = "openrouter"
model = "google/gemini-flash-1.5"

[providers.oracle]
provider = "openai"
model = "o1"

Tool System

Codesm ships with 30 built in tools registered through codesm/tool/registry.py. They fall into broad categories:

Category	Tools
File ops	`read`, `write`, `edit`, `multiedit`, `multifile_edit`, `patch`
Search	`grep`, `glob`, `ls`, `codesearch` (semantic), `finder`
Shell	`bash` (gated by permissions)
Code intelligence	`lsp` (symbols, hover, references), `diagnostics`
Git	`git` (status, diff, blame, log)
Web	`websearch`, `webfetch`, `web`
Subagents	`parallel_tasks`, `pipeline`, `orchestrate`, `oracle` (deep reasoning), `task`, `batch`
Code quality	`refactor`, `testgen`, `bug_localize`, `code_review`
Docs and diagrams	`mermaid`, `handoff`, `read_thread`, `find_thread`
MCP bridge	`mcp_execute` (call any tool from any connected MCP server)

Each tool's description is loaded from a .txt file next to its .py implementation, so prompt tuning does not require touching code. See codesm/tool/bash.txt for an example.

Parallel Subagents

The parallel_tasks tool runs up to ten subagents concurrently. Inspired by opencode's batch and task pattern.

{
  "tasks": [
    {
      "subagent_type": "researcher",
      "prompt": "Find all API endpoints in the codebase",
      "description": "Find API endpoints"
    },
    {
      "subagent_type": "researcher",
      "prompt": "Analyze the authentication flow",
      "description": "Analyze auth flow"
    },
    {
      "subagent_type": "finder",
      "prompt": "Find all test files",
      "description": "Find test files"
    }
  ],
  "fail_fast": false
}

Subagent types:

Type	Best For	Default Model
`coder`	Multi file edits, feature implementation	Claude Sonnet
`researcher`	Read only code analysis	Claude Sonnet
`reviewer`	Bug detection, security review	Claude Sonnet
`planner`	Implementation plans	Claude Sonnet
`finder`	Fast code search	Gemini Flash
`oracle`	Deep reasoning	o1
`librarian`	Multi repo research	Claude Sonnet
`auto`	Router picks the best agent for the task	Varies

Features:

Up to ten concurrent tasks (configurable cap to prevent resource exhaustion)
fail_fast: true cancels remaining tasks on first failure
Per task timing and success or failure indicators
Combined result aggregation with truncation for long outputs

Pipeline Subagents

For sequential workflows where each step reads the previous step's output:

{
  "steps": [
    {
      "subagent_type": "researcher",
      "prompt": "Find all usages of the legacy_auth() function"
    },
    {
      "subagent_type": "planner",
      "prompt": "Plan a migration from legacy_auth() to the new auth system using the findings above"
    },
    {
      "subagent_type": "coder",
      "prompt": "Execute the migration plan"
    }
  ]
}

Each stage gets the previous stage's result injected into its prompt. Up to five pipeline steps.

For staged workflows (sequential stages, parallel tasks within each stage), use orchestrate:

{
  "stages": [
    [
      {"subagent_type": "researcher", "prompt": "Analyze current auth system"},
      {"subagent_type": "finder",     "prompt": "Find all auth related files"}
    ],
    [
      {"subagent_type": "planner",    "prompt": "Plan auth improvements"}
    ],
    [
      {"subagent_type": "coder",      "prompt": "Implement planned changes"},
      {"subagent_type": "coder",      "prompt": "Add tests for new auth code"}
    ]
  ],
  "fail_fast": true
}

Context Management

The ContextManager tracks estimated token usage and triggers compaction before the context window fills up. Compaction preserves a configurable "recent budget" of turns and replaces the older history with an LLM generated summary.

Configuration (defaults):

max_tokens = 128000           # adjust per model
compact_trigger_ratio = 0.75  # start compacting at 75% full
recent_budget_ratio   = 0.30  # keep the last 30% of messages untouched

Compaction runs automatically inside the ReAct loop. See codesm/agent/loop.py line 44 to 49. You can also trigger it manually with /compact in the TUI.

Permissions and Audit

Codesm's permission system (codesm/permission/permission.py) gates every destructive operation. Each request carries:

Action (bash, write, edit, delete)
Resource (the file path, command string, or URL)
Session context (who is asking, what for)

The default policy prompts the user interactively (via the TUI); in non interactive mode it falls back to a config driven allow or deny list.

Every grant and denial is recorded in the audit log via codesm/audit/. The log captures:

File operations (create, update, delete, diff summary)
Bash executions (command, exit code, duration)
Permission decisions (granted, denied, user cancelled)
Tool call traces (tool name, arguments, result, timing)

Read it with:

codesm audit show <SESSION_ID>
codesm audit recent

Sessions

Every Codesm run is a session. Sessions have:

ID: deterministic, resumable
Title: auto generated from the first user message via session/title.py
Topics: indexed by session/topics.py for fast search
Summary: rolling summary from the compaction pipeline
Events: structured event stream (tool calls, permissions, errors)

codesm sessions list              # List recent sessions
codesm sessions show <ID>         # Print session details
codesm --resume <ID>              # Resume a session in the TUI

Configuration

Codesm reads its config from ~/.config/codesm/config.toml. Example:

[default]
provider = "anthropic"
model = "claude-sonnet-4-5"
max_iterations = 50

[context]
max_tokens = 128000
compact_trigger_ratio = 0.75
recent_budget_ratio = 0.30

[permissions]
# Default policy: "ask" or "allow" or "deny"
bash   = "ask"
write  = "ask"
edit   = "allow"
delete = "ask"

# Command level allow list for bash
bash_allow = ["ls", "cat", "grep", "rg", "pytest", "cargo test"]

[tui]
theme = "dark"
auto_compact_indicator = true

[providers.finder]
provider = "openrouter"
model = "google/gemini-flash-1.5"

[providers.oracle]
provider = "openai"
model = "o1"

Environment Variables

Variable	Purpose
`ANTHROPIC_API_KEY`	Required for the Anthropic provider
`OPENAI_API_KEY`	Required for the OpenAI provider
`OPENROUTER_API_KEY`	Required for OpenRouter routed models
`OLLAMA_HOST`	Ollama server URL (default `http://localhost:11434`)
`CODESM_CONFIG_DIR`	Override config directory (default `~/.config/codesm/`)
`CODESM_DATA_DIR`	Override data directory (default `~/.local/share/codesm/`)
`CODESM_MAX_ITERATIONS`	Cap on ReAct loop iterations per turn
`CODESM_LOG_LEVEL`	`DEBUG`, `INFO`, `WARNING`, `ERROR` (default `INFO`)

MCP Integration

Codesm is a full Model Context Protocol citizen. It can both consume tools from external MCP servers and expose its own tools as an MCP server.

Loading external MCP tools

Add servers to ~/.config/codesm/mcp-servers.json (a project level override at ./mcp-servers.json is also supported):

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem", "/home/aditya"]
    },
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {
        "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
      }
    }
  }
}

On startup, Codesm connects to each server, fetches the tool list, and registers them through codesm/mcp/manager.py. They appear alongside the built in tools in the registry.

Exposing Codesm's tools over MCP

codesm mcp-server --port 3456

Any MCP compatible agent (Claude Code, Cursor, Windsurf) can now call Codesm's 30 built in tools, including parallel_tasks, oracle, and codesearch, from its own context.

How It Works

┌──────────────┐     ┌──────────────────┐     ┌──────────────────┐
│     You      │────>│     Codesm       │────>│   LLM Provider   │
│  (TUI input  │     │   ReAct loop     │     │  Anthropic       │
│   + slash    │     │   Tool registry  │     │  OpenAI          │
│   commands)  │     │   Context mgr    │     │  OpenRouter      │
│              │<────│   Permissions    │<────│  Ollama          │
└──────────────┘     │   Audit log      │     └──────────────────┘
                     └────────┬─────────┘
                              │
                ┌─────────────┼─────────────┐
                │             │             │
                ▼             ▼             ▼
        ┌───────────┐  ┌───────────┐  ┌───────────┐
        │  Built in │  │    MCP    │  │ Subagents │
        │   tools   │  │  servers  │  │ (parallel │
        │  (bash,   │  │ (external │  │  pipeline │
        │   read,   │  │   tools)  │  │  orches   │
        │  write,   │  │           │  │   trate)  │
        │   LSP...) │  │           │  │           │
        └───────────┘  └───────────┘  └───────────┘

You type a task in the TUI (or pass it via codesm run)
The ReAct loop sends the conversation and tool schemas to the provider
The provider streams back text and/or tool calls
The TUI renders text chunks and buffers tool calls until the block is complete
Each tool call is dispatched through the registry:
- Built in tools execute inline
- MCP tools are proxied to the external server via stdio or HTTP
- Subagent tools (parallel_tasks, pipeline, orchestrate) spawn new ReAct loops with fresh tool registries
Before each provider call, the ContextManager checks token usage and compacts if needed
Destructive operations pass through the permission system; everything lands in the audit log
When the provider stops requesting tools, the session completes and the result streams to the TUI

Architecture

The full agent execution graph, including parallel and pipeline orchestration:

flowchart TD
    A[User Input] --> B[Agent.stream]
    B --> C[ReAct Loop Execute]
    C --> D[Provider API Call]
    D --> E{Response Type?}

    E -->|Text| F[Add Assistant Message]
    E -->|Tool Call| G[Extract Tool Call]
    E -->|Parallel Tasks| PA[Parallel Subagent Spawning]

    G --> H{Tool Type?}
    H -->|Built in| J[Direct Tool Execution]
    H -->|MCP Tool| I[MCP Execute]

    I --> K[Generate Python Code]
    K --> L[Execute in Subprocess]
    L --> M[MCP Client Call]
    M --> N[MCP Server Process]
    N --> O[Tool Implementation]
    O --> P[Tool Result]

    J --> Q[Tool.execute method]
    Q --> R{Tool Category?}
    R -->|File Ops| S[read, write, edit]
    R -->|Search| T[grep, glob, codesearch]
    R -->|External| U[bash, webfetch]
    R -->|Subagent| V[task, oracle]

    S --> W[File System Operations]
    T --> X[Search Operations]
    U --> Y[External Process or API]
    V --> Z[Spawn Subagent]

    W --> P
    X --> P
    Y --> P
    Z --> AA[Subagent Result]
    AA --> P

    PA --> PB{Orchestration Type?}
    PB -->|parallel_tasks| PC[Concurrent Execution]
    PB -->|orchestrate| PD[Staged Execution]
    PB -->|pipeline| PE[Sequential Chain]

    PC --> PF[asyncio.gather]
    PD --> PG[Stage 1 Parallel] --> PH[Stage 2 Parallel] --> PI[Stage N Parallel]
    PE --> PJ[Step 1] --> PK[Pass Result] --> PL[Step 2]

    PF --> PM[Subagent 1]
    PF --> PN[Subagent 2]
    PF --> PO[Subagent N]

    PM --> PQ[Aggregate Results]
    PN --> PQ
    PO --> PQ
    PI --> PQ
    PL --> PQ
    PQ --> P

    P --> BB[Add Tool Result Message]
    BB --> CC[Update Session State]
    CC --> DD{More Tool Calls?}

    DD -->|Yes| G
    DD -->|No| EE[Continue ReAct Loop]
    EE --> D

    F --> FF[Session Complete]

    CC --> GG[Context Manager]
    GG --> HH{Should Compact?}
    HH -->|Yes| II[LLM Summarize]
    HH -->|No| JJ[Continue]
    II --> JJ

Package layout:

codesm/agent/: ReAct loop, agent, subagent, router, orchestrator
codesm/provider/: Anthropic, OpenAI, OpenRouter, and Ollama clients
codesm/tool/: all built in tools, registry, descriptions
codesm/mcp/: MCP client, manager, sandbox, codegen, server
codesm/session/: session state, context manager, summarizer, topics
codesm/permission/: permission system and request types
codesm/audit/: append only audit log
codesm/lsp/: Language Server Protocol client
codesm/search/: embedding based code search
codesm/memory/: project, session, and topic memory layers
codesm/skills/: skill suggestions aware of file context
codesm/tui/: Textual app, chat, modals, command palette, autocomplete
codesm/config/: config schema and loader
codesm/snapshot/: file snapshots for atomic edits and rollback

Development

Quick setup: See the Quick Start. This section is for contributors.

Prerequisites

python --version   # 3.12+
uv --version       # latest

# Optional: a local Ollama server for offline testing
ollama --version

How to Run

# Clone and set up
git clone https://github.com/Aditya-PS-05/codesm
cd codesm
uv venv
source .venv/bin/activate
uv pip install -e ".[dev]"

# Run the TUI
codesm

# Run a one shot task
codesm run "Summarize the README"

# Run the test suite
pytest tests/ -v

# Lint (if ruff is installed)
ruff check codesm/

Advanced Development

Project Scripts

Command	Description
`uv pip install -e ".[dev]"`	Install with dev dependencies (pytest, pytest-asyncio)
`pytest tests/ -v`	Run the test suite
`pytest tests/test_mcp.py`	Run just the MCP integration tests
`python -m codesm.tui.app`	Launch the TUI directly (skip the CLI entry point)
`codesm run <prompt>`	Run a one shot task without entering the TUI

Repository Layout

codesm/: the Python package
tests/: unit and integration tests
examples/: runnable examples, including mcp_demo.py and mcp_code_execution_demo.py
prompts/: system prompts for agents and subagents
packages/: reserved for future split packages (e.g. standalone MCP server)
assets/: logo, screenshots, demo media
mcp-servers.json: MCP server registry

Testing

# Unit tests
pytest tests/ -v

# Single file
pytest tests/test_mcp.py -v

# Run with coverage
pytest tests/ --cov=codesm --cov-report=term-missing

Supported Platforms

Platform	Architecture	Status
Linux	x86_64	Primary development target
Linux	aarch64	Supported
macOS	aarch64 (Apple Silicon)	Supported
macOS	x86_64	Supported
Windows	x86_64	Experimental (Textual TUI works; some tools assume POSIX shells)

Codesm is pure Python. No native compilation beyond what pip resolves for its dependencies. If you have a working Python 3.12 and can install textual, you can run Codesm.

CLI Reference

codesm [OPTIONS]
  --provider <PROVIDER>     anthropic, openai, openrouter, ollama
  --model <MODEL>           Model name (e.g. claude-sonnet-4-5, gpt-4o, llama3.1)
  --resume <SESSION_ID>     Resume a past session
  --config <PATH>           Override config file path
  --log-level <LEVEL>       DEBUG, INFO, WARNING, ERROR

codesm run <PROMPT>         Run a one shot task without the TUI
codesm sessions list        List recent sessions
codesm sessions show <ID>   Print session details
codesm audit show <ID>      Print audit log for a session
codesm audit recent         Print recent audit entries
codesm mcp-server           Start Codesm as an MCP server for other agents
codesm --help               Show full help

Environment variables:

ANTHROPIC_API_KEY      Required for Anthropic provider
OPENAI_API_KEY         Required for OpenAI provider
OPENROUTER_API_KEY     Required for OpenRouter provider
OLLAMA_HOST            Ollama server URL (default http://localhost:11434)
CODESM_CONFIG_DIR      Override ~/.config/codesm/
CODESM_DATA_DIR        Override ~/.local/share/codesm/
CODESM_MAX_ITERATIONS  Cap the ReAct loop per turn
CODESM_LOG_LEVEL       Logging verbosity

Contributing

Contributions are welcome. I especially want new tools, new subagent types, new failure modes documented in Failure Modes Observed, and provider adapters.

TL;DR for a first PR:

Fork the repo and create a feature branch.
Make your change, add a test under tests/.

Run locally:

pytest tests/ -v
ruff check codesm/  # if you have ruff installed

Commit with a Conventional Commits message (feat:, fix:, docs:, refactor:...).
Open a PR describing the why, not just the what.

If you are adding a new tool, the convention is:

codesm/tool/<name>.py: the implementation (subclass BaseTool, implement execute)
codesm/tool/<name>.txt: the prompt description shown to the model
Register it in codesm/tool/registry.py
Add a test under tests/test_tools_<name>.py

Acknowledgments

Anthropic and OpenAI for the model APIs Codesm is built on top of
Ollama for making local model inference painless
OpenRouter for unified routing across providers
Textual and Rich for the TUI framework
Typer for the CLI ergonomics
The Model Context Protocol team at Anthropic for the MCP specification
The original ReAct paper (Yao et al., 2022). Still the most useful mental model for structuring agent loops.
Claude Code, Cursor, Amp, Aider, and opencode. Reference points for what a good coding agent feels like, and for specific design patterns (batch and task orchestration, staged execution) this project borrows from.
TryAudex for the README layout this project copied wholesale.
Every researcher who has written about coding agent failure modes. This tool exists to make more of them visible.

License

MIT, by Aditya Pratap Singh

If you find this project useful, please consider starring it or follow me on GitHub for more work on AI coding agents, agent infrastructure, and model evaluation tooling. Issues, PRs, and new failure modes all welcome.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

adipras407

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codesm-0.1.0.tar.gz (300.3 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

codesm-0.1.0-py3-none-any.whl (363.1 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file codesm-0.1.0.tar.gz.

File metadata

Download URL: codesm-0.1.0.tar.gz
Upload date: Apr 12, 2026
Size: 300.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codesm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a17a0296c58201b353ac436e1e16523044efa58ee62f512fc5566756cc758fa3`
MD5	`60d22ff0de428b439cda1ce7434f36dd`
BLAKE2b-256	`6ae9554578858dffc2345d023701689a4bbef7e3a1d7c5e82ab8c027d6dfd0c5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for codesm-0.1.0.tar.gz:

Publisher: publish.yml on Aditya-PS-05/codesm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: codesm-0.1.0.tar.gz
- Subject digest: a17a0296c58201b353ac436e1e16523044efa58ee62f512fc5566756cc758fa3
- Sigstore transparency entry: 1281685858
- Sigstore integration time: Apr 12, 2026
Source repository:
- Permalink: Aditya-PS-05/codesm@87e5101352a6da8d7e55f8f7eab15858fd814208
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Aditya-PS-05
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@87e5101352a6da8d7e55f8f7eab15858fd814208
- Trigger Event: push

File details

Details for the file codesm-0.1.0-py3-none-any.whl.

File metadata

Download URL: codesm-0.1.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 363.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for codesm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`baab5d759fcda7676a443c37025b8e96659d1b8e23023e2b3b07cb785944d68c`
MD5	`f478d1038f9cd060b00ea7bf9b9a8abe`
BLAKE2b-256	`59fd09f9a7b777626b926ca186a4f9a091678b6eb5893033b0ef1ef625c6c974`

See more details on using hashes here.

Provenance

The following attestation bundles were made for codesm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Aditya-PS-05/codesm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: codesm-0.1.0-py3-none-any.whl
- Subject digest: baab5d759fcda7676a443c37025b8e96659d1b8e23023e2b3b07cb785944d68c
- Sigstore transparency entry: 1281686075
- Sigstore integration time: Apr 12, 2026
Source repository:
- Permalink: Aditya-PS-05/codesm@87e5101352a6da8d7e55f8f7eab15858fd814208
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Aditya-PS-05
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@87e5101352a6da8d7e55f8f7eab15858fd814208
- Trigger Event: push

codesm 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Codesm

Overview

Why "Codesm"?

Contents

Features

Failure Modes Observed

Installation

Quick Start

Prerequisites

From Source

Usage

Basic Commands

Providers

Tool System

Parallel Subagents

Pipeline Subagents

Context Management

Permissions and Audit

Sessions

Configuration

Environment Variables

MCP Integration

Loading external MCP tools

Exposing Codesm's tools over MCP

How It Works

Architecture

Development

Prerequisites

How to Run

Project Scripts

Repository Layout

Testing

Supported Platforms

CLI Reference

Contributing

Acknowledgments

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance