Skip to main content

Local-first CLI code intelligence tool with LangChain-powered RAG

Project description

CodeSage

Local-first code intelligence CLI. Search, analyze, and chat with your codebase using natural language — all running on your machine via Ollama.

Install

# macOS
brew install pipx && pipx ensurepath

# Linux
python3 -m pip install --user pipx && python3 -m pipx ensurepath
pipx install pycodesage

Optional features

# Tree-sitter AST analysis for JS, TS, Go, Rust (recommended)
pipx inject pycodesage "pycodesage[multi-language]"

# MCP server for Claude Desktop / Cursor / Windsurf
pipx inject pycodesage "pycodesage[mcp]"

# Both at once
pipx inject pycodesage "pycodesage[multi-language,mcp]"

Without multi-language, review checks for non-Python files still run but use text/regex heuristics instead of real AST parsing.

Requirements

Ollama must be running:

ollama pull qwen2.5-coder:7b   # LLM
ollama pull nomic-embed-text    # embeddings (fast, lightweight)
ollama serve

Alternatively, use any Ollama-compatible model. qwen3-embedding gives significantly better semantic search if you have the RAM.

Quick start

cd your-project
codesage init       # detect languages, write .codesage/config.yaml
codesage index      # parse files, build vector + graph index
codesage chat       # ask questions about your code
codesage review     # review uncommitted changes

Commands

codesage init

Detects languages in the project and creates .codesage/config.yaml. Safe to re-run — it won't overwrite an existing config.

codesage init
codesage init --model llama3.1     # use a different LLM
codesage init --embedding-model qwen3-embedding

codesage index

Parses source files, generates embeddings, and builds the call graph. Only processes changed files by default.

codesage index              # incremental (changed files only)
codesage index --full       # reindex everything from scratch
codesage index --clear      # wipe the index first, then reindex
codesage index --no-learn   # skip pattern learning

Index data lives in .codesage/ — SQLite for metadata, LanceDB for vectors, KuzuDB for the call graph.

codesage chat

Interactive session. Ask anything in plain English, or use slash commands:

/search <query>     semantic code search with RRF fusion
/deep <query>       multi-strategy deep analysis
/plan <task>        generate an implementation plan
/similar <name>     find similar functions/classes
/patterns [query]   show learned patterns from this codebase
/review [file]      review code changes with LLM
/security [path]    security vulnerability scan
/impact <name>      blast radius: who calls this, what breaks
/mode <mode>        switch to brainstorm / implement / review
/context            show or adjust context window settings
/stats              index statistics
/export [file]      save conversation to file
/clear              clear chat history
/help               show all commands

Natural language questions work too — you don't have to use slash commands.

codesage review

Reviews uncommitted changes. Combines static analysis, pattern deviation detection, and (in full mode) semantic similarity search.

codesage review                    # all uncommitted changes, fast mode
codesage review --staged           # staged changes only (good for pre-commit)
codesage review --mode full        # add semantic similarity + LLM synthesis
codesage review --severity warning # block on warnings, not just high+critical
codesage review --format json      # JSON output for CI pipelines
codesage review --format sarif     # SARIF for GitHub Advanced Security
codesage review --verbose          # show timing and suppression details
codesage review path/to/subdir     # limit to a subdirectory

What it checks:

Category Rules
Python static Long functions, high complexity, deep nesting, too many params, god classes, missing return types, magic numbers
Rust/Go/JS/TS static Cyclomatic complexity, long functions, deep nesting, param count, naming conventions (requires multi-language)
Security Hardcoded secrets, SQL injection, eval/exec, unsafe deserialization, weak crypto, XSS sinks
Patterns Deviations from your codebase's own learned patterns

Suppress a finding inline: # codesage:ignore GEN-LONG-LINE or # codesage:ignore-next-line. Suppress a file: add it to .codesageignore.

codesage hook

Installs a git pre-commit hook that runs codesage review --staged before each commit.

codesage hook install    # install the hook
codesage hook uninstall  # remove it
codesage hook status     # check if installed

The hook blocks commits with findings at high severity or above. Bypass when needed: git commit --no-verify.

If you use the pre-commit framework instead, this repo includes a .pre-commit-hooks.yaml:

repos:
  - repo: https://github.com/keshavashiya/codesage
    rev: v0.3.1
    hooks:
      - id: codesage-review

codesage mcp

MCP server for AI IDE integration. Always runs in global mode — all projects you've indexed with codesage index are available through a single server.

codesage mcp serve                    # stdio (default, for IDE use)
codesage mcp serve -t sse -p 8080     # HTTP/SSE for multi-client setups
codesage mcp setup                    # print IDE config
codesage mcp test                     # smoke-test all tools

MCP Setup

Run codesage mcp setup to get the config, or add this to your IDE:

{
  "mcpServers": {
    "codesage": {
      "command": "codesage",
      "args": ["mcp", "serve"]
    }
  }
}
Available MCP tools (12)
Tool What it does
list_projects List all indexed projects (global mode only)
get_developer_profile Your coding style and learned patterns
search_code Semantic search with confidence scoring
get_file_context File content with definitions and security notes
get_stats Index stats: files, elements, languages
review_code Run a code review on a file or diff
analyze_security Security vulnerability scan
explain_concept How is X implemented in this codebase?
suggest_approach Implementation guidance for a task
trace_flow Callers and callees through the call graph
find_examples Usage examples for a function or pattern
recommend_pattern Patterns from your codebase's memory

Configuration

Created by codesage init at .codesage/config.yaml. The most useful fields:

project_name: my-project
languages:
  - python      # auto-detected

llm:
  provider: ollama
  model: qwen2.5-coder:7b
  embedding_model: nomic-embed-text
  base_url: http://localhost:11434

exclude_dirs:
  - node_modules
  - venv
  - .git
Full configuration reference
llm:
  provider: ollama          # ollama | openai | anthropic
  model: qwen2.5-coder:7b
  embedding_model: nomic-embed-text
  base_url: http://localhost:11434
  temperature: 0.3
  max_tokens: 500
  request_timeout: 30.0

storage:
  vector_backend: lancedb
  use_graph: true           # enable call graph (KuzuDB)

security:
  enabled: true
  severity_threshold: medium
  block_on_critical: true

memory:
  enabled: true
  learn_on_index: true      # learn patterns during indexing
  min_pattern_confidence: 0.5

performance:
  embedding_batch_size: 200
  embedding_cache_size: 1000
  cache_enabled: true

Language support

Language Indexing Static review Call graph
Python built-in
Rust multi-language ✓ AST-based
Go multi-language ✓ AST-based
TypeScript multi-language ✓ AST-based
JavaScript multi-language ✓ AST-based

Install pycodesage[multi-language] for Rust/Go/JS/TS. Without it, those files are still indexed and reviewed using text/regex heuristics.

Using OpenAI or Anthropic instead of Ollama

pipx inject pycodesage "pycodesage[openai]"
# or
pipx inject pycodesage "pycodesage[anthropic]"

Then set in .codesage/config.yaml:

llm:
  provider: openai
  model: gpt-4o

Development

git clone https://github.com/keshavashiya/codesage.git
cd codesage
python3 -m venv venv && source venv/bin/activate
pip install -e ".[dev,multi-language,mcp]"
pytest tests/ -v

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pycodesage-0.3.2.tar.gz (230.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pycodesage-0.3.2-py3-none-any.whl (271.6 kB view details)

Uploaded Python 3

File details

Details for the file pycodesage-0.3.2.tar.gz.

File metadata

  • Download URL: pycodesage-0.3.2.tar.gz
  • Upload date:
  • Size: 230.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycodesage-0.3.2.tar.gz
Algorithm Hash digest
SHA256 2c467f7a86ea524f96a05362a031a74870a2dd5236baa2853e0684ddefec44c9
MD5 e19b68a30dc45d0764ecb1a5fede82c2
BLAKE2b-256 87a13885f6bd2037c83e2f651318fc0d7eb0d00d7935c1aa12e0a3adb24a8be9

See more details on using hashes here.

File details

Details for the file pycodesage-0.3.2-py3-none-any.whl.

File metadata

  • Download URL: pycodesage-0.3.2-py3-none-any.whl
  • Upload date:
  • Size: 271.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for pycodesage-0.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8bdebcfa12325173997be21a60f2c19d4d2655801b90ac3e30fd296103c47af9
MD5 dbede5bb8e82fe26bc99b2da84dc37c2
BLAKE2b-256 690beb4147dec5e63132551cfba32b5c34fdcdef4b0627196ef6d786a48e4836

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page