Skip to main content

AI-era enterprise-grade code analysis tool with comprehensive HTML/CSS support, dynamic plugin architecture, and MCP integration

Project description

๐ŸŒณ Tree-sitter Analyzer

English | ๆ—ฅๆœฌ่ชž | ็ฎ€ไฝ“ไธญๆ–‡

Python Version License Tests Coverage PyPI Version GitHub Stars

Tree-Sitter-Analyzer is a local-first code context engine for AI-assisted development โ€” combining fast repository retrieval, AST-based structural analysis, and secure MCP integration.

Its job is not just to parse code. Its job is to help humans and AI agents fetch only the code context they actually need, safely, quickly, and with structural precision.

find the right files โ†’ find the right matches โ†’ extract the right structure โ†’ send only the right context

Claude doesn't need to read your entire codebase. Neither do you.

17 languages ยท Project-boundary security ยท Claude Desktop / Cursor / Roo Code ยท CLI + Python API


โœจ What's New in v1.11.0

  • Claude knows your project's skeleton before reading a single file: get_project_summary returns PageRank-ranked architecture nodes โ€” the classes everything else extends. Validated on elasticsearch (40k files), spring-framework (11k), mybatis, spring-petclinic.
  • Touch a critical class? Claude stops you first: modification_guard reads the architecture ranking. Rename Writeable in elasticsearch โ†’ verdict UNSAFE, rank #1, 4745 callers. No surprises.
  • New language = new file, not a rewrite: Plugin edge_extractors/ package โ€” Java, Python, TypeScript ship today. Adding Kotlin is one file + one line.
  • 2x faster exploration on unfamiliar projects: End-to-end tested โ€” 5 tool calls with summary vs 10+ without. Claude skips the blind search phase entirely.
  • Zero-config first-party filtering: Java reads groupId from pom.xml. Python uses sys.stdlib_module_names. No blacklists to maintain. Ever.

๐Ÿ“– Full Changelog for complete version history.

๐ŸŽฌ See It In Action

Demo GIF coming soon - showcasing AI integration with SMART workflow


๐ŸŽฏ Why Tree-sitter Analyzer

Tree-sitter Analyzer is an open-source, local-first code context engine for helping AI assistants read only what matters in large codebases.

  • Minimal context, not whole-file stuffing: retrieve the smallest useful code regions before sending them to AI
  • Evidence-based analysis: combine tree-sitter structure with fd and ripgrep to surface relevant files, symbols, and paths
  • No heavy preprocessing required: useful on messy repositories where full indexing can be slow, stale, or difficult to maintain

Common Use Cases

  • Understand what a very large file or module is doing without loading the entire file into an AI prompt
  • Trace business logic, UI handlers, or bug-related code paths across a complex repository
  • Narrow AI context for Java and other large codebases before asking for analysis or changes

๐Ÿš€ 5-Minute Quick Start

Prerequisites

# Install uv (required)
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"

# Install fd + ripgrep (required for search features)
brew install fd ripgrep          # macOS
winget install sharkdp.fd BurntSushi.ripgrep.MSVC  # Windows

๐Ÿ“– Detailed Installation Guide for all platforms.

Verify Installation

uv run tree-sitter-analyzer --show-supported-languages

๐Ÿค– AI Integration

Configure your AI assistant to use Tree-sitter Analyzer via MCP protocol.

This works especially well when your assistant struggles with very large files, noisy repository-wide context, or legacy code that is too expensive to load all at once.

Claude Desktop / Cursor / Roo Code

Add to your MCP configuration:

{
  "mcpServers": {
    "tree-sitter-analyzer": {
      "command": "uvx",
      "args": [
        "--from", "tree-sitter-analyzer[mcp]",
        "tree-sitter-analyzer-mcp"
      ],
      "env": {
        "TREE_SITTER_PROJECT_ROOT": "/path/to/your/project",
        "TREE_SITTER_OUTPUT_PATH": "/path/to/output/directory"
      }
    }
  }
}

Configuration file locations:

  • Claude Desktop: %APPDATA%\Claude\claude_desktop_config.json (Windows) / ~/Library/Application Support/Claude/claude_desktop_config.json (macOS)
  • Cursor: Built-in MCP settings
  • Roo Code: MCP configuration

After restart, tell the AI: Please set the project root directory to: /path/to/your/project

๐Ÿ“– MCP Tools Reference for complete API documentation.


๐Ÿ’ป Common CLI Commands

Installation

uv add "tree-sitter-analyzer[all,mcp]"  # Full installation

Top 5 Commands

# 1. Analyze file structure
uv run tree-sitter-analyzer examples/BigService.java --table full

# 2. Quick summary
uv run tree-sitter-analyzer examples/BigService.java --summary

# 3. Extract code section
uv run tree-sitter-analyzer examples/BigService.java --partial-read --start-line 93 --end-line 106

# 4. Find files and search content
uv run find-and-grep --roots . --query "class.*Service" --extensions java

# 5. Query specific elements
uv run tree-sitter-analyzer examples/BigService.java --query-key methods --filter "public=true"
๐Ÿ“‹ View Output Example
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                   BigService.java Analysis                   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Total Lines: 1419 | Code: 906 | Comments: 246 | Blank: 267  โ”‚
โ”‚ Classes: 1 | Methods: 66 | Fields: 9 | Complexity: 5.27 avg โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ

๐Ÿ“– Complete CLI Reference for all commands and options.


๐ŸŒ Supported Languages

Language Support Level Key Features
Java โœ… Complete Spring, JPA, enterprise features
Python โœ… Complete Type annotations, decorators
TypeScript โœ… Complete Interfaces, types, TSX/JSX
JavaScript โœ… Complete ES6+, React/Vue/Angular
C โœ… Complete Functions, structs, unions, enums, preprocessor
C++ โœ… Complete Classes, templates, namespaces, inheritance
C# โœ… Complete Records, async/await, attributes
SQL โœ… Enhanced Tables, views, procedures, triggers
HTML โœ… Complete DOM structure, element classification
CSS โœ… Complete Selectors, properties, categorization
Go โœ… Complete Structs, interfaces, goroutines
Rust โœ… Complete Traits, impl blocks, macros
Kotlin โœ… Complete Data classes, coroutines
PHP โœ… Complete PHP 8+, attributes, traits
Ruby โœ… Complete Rails patterns, metaprogramming
YAML โœ… Complete Anchors, aliases, multi-document
Markdown โœ… Complete Headers, code blocks, tables

๐Ÿ“– Features Documentation for language-specific details.


๐Ÿ“Š Features Overview

Feature Description Learn More
SMART Workflow Set-Map-Analyze-Retrieve-Trace methodology Guide
Outline-First Navigation get_code_outline โ€” hierarchical structure map before content retrieval MCP Tools
MCP Protocol Native AI assistant integration API Docs
Token Optimization TOON format delivers 54-56% token reduction; token-aware controls for large AI workflows Features
File Search fd-based high-performance discovery CLI Reference
Content Search ripgrep regex search CLI Reference
Security Project boundary protection Architecture

๐Ÿ”ฌ Grammar Coverage (MECE Framework)

Tree-sitter Analyzer guarantees zero False Positives in grammar coverage validation across all 17 supported languages.

Phase 1: MECE Architecture (2026-03)

New Architecture:

  • Tracks syntactic paths (node_type, parent_path) instead of just node types
  • Uses exact node identity matching (type + byte range + parent chain + file path)
  • Eliminates nested node misclassification (wrapper nodes no longer cause False Positives)

Why It Matters:

# OLD method: Position overlap โ†’ False Positives
@decorator       # Plugin extracts this
def foo():       # Validator incorrectly marks this as "covered" (it's not!)
    pass

# NEW method: Exact identity matching โ†’ Zero False Positives
# Only nodes actually extracted by the plugin are marked as covered

Validation Commands

# Validate single language
python -c "from tree_sitter_analyzer.grammar_coverage.validator import validate_plugin_coverage_sync; r = validate_plugin_coverage_sync('python'); print(f'{r.coverage_percentage:.1f}% coverage')"

# Validate all languages
python -c "
from tree_sitter_analyzer.grammar_coverage.validator import validate_plugin_coverage_sync
langs = ['python', 'javascript', 'java', 'go', 'typescript', 'c', 'cpp', 'rust', 'ruby', 'php', 'kotlin', 'swift', 'scala', 'bash', 'yaml', 'json', 'sql']
for lang in langs:
    r = validate_plugin_coverage_sync(lang)
    status = 'โœ…' if r.coverage_percentage == 100.0 else 'โŒ'
    print(f'{status} {lang}: {r.coverage_percentage:.1f}% ({r.covered_node_types}/{r.total_node_types})')
"

Example Output (new format):

โœ… python: 100.0% (57/57 node types covered)
โœ… javascript: 100.0% (58/58 node types covered)
โœ… typescript: 100.0% (114/114 node types covered)
...
โœ… sql: 100.0% (155/155 node types covered)

MECE Guarantees

  • Mutually Exclusive: Each node has a unique (type, parent_path) โ†’ no double counting
  • Collectively Exhaustive: Full AST traversal โ†’ no missing nodes
  • Zero False Positives: Exact matching โ†’ only truly extracted nodes marked as covered

๐Ÿ“– Grammar Coverage Framework for technical details and architecture.


๐Ÿ† Quality & Testing

Metric Value
Tests 8,942+ automated tests
Coverage Coverage
Type Safety 100% mypy compliance
Platforms Windows, macOS, Linux
# Run tests
uv run pytest tests/ -v

# Generate coverage report
uv run pytest tests/ --cov=tree_sitter_analyzer --cov-report=html

๐Ÿ”’ Security & Architecture

Tree-sitter Analyzer is designed with security-by-default principles for AI-assisted development workflows.

Security Model

Project Boundary Enforcement

  • All MCP tools validate file paths against project root boundaries
  • No access to files outside the configured project directory
  • Symlink traversal prevention
  • Path normalization prevents ../ escape attempts

Input Validation

  • JSON Schema validation on all MCP tool parameters
  • Type-safe Python API with strict mypy compliance
  • Sanitized user inputs before shell command execution
  • Pattern validation for glob/regex searches

No Remote Execution

  • 100% local processing โ€” no cloud dependencies
  • No telemetry or data collection
  • No network calls except optional PyPI version checks
  • Source code analysis stays on your machine

Secure Defaults

  • Read-only file operations by default
  • Explicit opt-in required for any file modifications
  • Sandboxed subprocess execution for external tools (fd, ripgrep)
  • Environment variable isolation

Architecture Principles

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  AI Assistant (Claude Desktop / Cursor / Roo Code)     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚ MCP Protocol (JSON-RPC)
                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  MCP Server Layer                                       โ”‚
โ”‚  โ€ข Input validation (JSON Schema)                       โ”‚
โ”‚  โ€ข Project boundary checks                              โ”‚
โ”‚  โ€ข Tool dispatch                                        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                     โ”‚
                     โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  Analysis Engine                                        โ”‚
โ”‚  โ€ข Tree-sitter AST parsing (17 languages)               โ”‚
โ”‚  โ€ข Fast file search (fd)                                โ”‚
โ”‚  โ€ข Content search (ripgrep)                             โ”‚
โ”‚  โ€ข Output formatting (JSON / TOON)                      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Security Boundaries:

  1. MCP Protocol: AI can only call explicitly defined tools with validated schemas
  2. Project Root: File operations confined to configured directory
  3. Read-Only: No destructive operations without explicit user consent
  4. Local-First: All processing happens on your machine

Security Testing

  • 8,942+ automated tests including security-focused edge cases
  • 100% mypy type safety prevents entire classes of bugs
  • CI/CD security scans: Bandit (Python security), safety (dependency vulnerabilities)
  • Manual security review of all MCP tool implementations

Reporting Security Issues

Found a security concern? Please email aimasteracc@gmail.com or open a private security advisory on GitHub.

We do NOT use automated security badge services โ€” our security posture is documented through architecture, testing, and code review, not third-party scores.


๐Ÿ› ๏ธ Development

Setup

git clone https://github.com/aimasteracc/tree-sitter-analyzer.git
cd tree-sitter-analyzer
uv sync --extra all --extra mcp

Quality Checks

uv run pytest tests/ -v                    # Run tests
uv run python check_quality.py --new-code-only  # Quality check
uv run python llm_code_checker.py --check-all   # AI code check

๐Ÿ“– Architecture Guide for system design details.


๐Ÿค Contributing & License

We welcome contributions! See Contributing Guide for development guidelines.

โญ Support

If this project helps you, please give us a โญ on GitHub!

๐Ÿ’ Sponsors

@o93 - Lead Sponsor supporting MCP tool enhancement, test infrastructure, and quality improvements.

๐Ÿ’– Sponsor this project

๐Ÿ“„ License

MIT License - see LICENSE file.


๐Ÿงช Testing

Test Coverage

Metric Value
Test Suite 8,942+ automated tests across unit, integration, regression, property, benchmark, and compatibility layers
Code Coverage Coverage
Type Safety 100% mypy compliance

Running Tests

# Run all tests
uv run pytest tests/ -v

# Run specific test category
uv run pytest tests/unit/ -v              # Unit tests
uv run pytest tests/integration/ -v         # Integration tests
uv run pytest tests/regression/ -m regression  # Regression tests
uv run pytest tests/benchmarks/ -v         # Benchmark tests

# Run with coverage
uv run pytest tests/ --cov=tree_sitter_analyzer --cov-report=html

# Run property-based tests
uv run pytest tests/property/

# Run performance benchmarks
uv run pytest tests/benchmarks/ --benchmark-only

Test Documentation

Document Description
Test Writing Guide Comprehensive guide for writing tests
Regression Testing Guide Golden Master methodology and regression testing
Testing Documentation Project testing standards

Test Categories

  • Unit Tests: Test individual components in isolation
  • Integration Tests: Test component interactions
  • Regression Tests: Ensure backward compatibility and format stability
  • Property Tests: Use Hypothesis-based invariant checking
  • Benchmark Tests: Track performance and regression signals
  • Compatibility Tests: Validate cross-version behavior

CI/CD Integration

  • Test Coverage Workflow: Automated coverage checks on PRs and pushes
  • Regression Tests Workflow: Golden Master validation and format stability checks
  • Performance Benchmarks: Daily benchmark runs with trend analysis
  • Quality Checks: Automated linting, type checking, and security scanning

Contributing Tests

When contributing new features:

  1. Write Tests: Follow the Test Writing Guide
  2. Ensure Coverage: Maintain >80% code coverage
  3. Run Locally: uv run pytest tests/ -v
  4. Check Quality: uv run ruff check . && uv run mypy tree_sitter_analyzer/
  5. Update Docs: Document new tests and features

๐Ÿ“š Documentation

Document Description
Installation Guide Setup for all platforms
CLI Reference Complete command reference
SMART Workflow AI-assisted analysis guide
MCP Tools API MCP integration details
Features Language support details
Architecture System design
Contributing Development guidelines
Test Writing Guide Comprehensive test writing guide
Regression Testing Guide Golden Master methodology
Changelog Version history

๐ŸŽฏ Built for developers working with large codebases and AI assistants

Making every line of code understandable to AI, enabling every project to break through token limitations

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tree_sitter_analyzer-1.11.1.tar.gz (1.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tree_sitter_analyzer-1.11.1-py3-none-any.whl (685.5 kB view details)

Uploaded Python 3

File details

Details for the file tree_sitter_analyzer-1.11.1.tar.gz.

File metadata

  • Download URL: tree_sitter_analyzer-1.11.1.tar.gz
  • Upload date:
  • Size: 1.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for tree_sitter_analyzer-1.11.1.tar.gz
Algorithm Hash digest
SHA256 fbc66eb07bac50f17a0bfb7ed44eb2a443ae8578ee4c7bbd4b53317e6cbc1f87
MD5 92f3c0b1669061ade1a7c9fd0c4b4ebc
BLAKE2b-256 81d2caf935ed73381d868e26c091fcc7e45fece0a496e4eae4839810c27c5c79

See more details on using hashes here.

File details

Details for the file tree_sitter_analyzer-1.11.1-py3-none-any.whl.

File metadata

File hashes

Hashes for tree_sitter_analyzer-1.11.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da2cb96be8849d5bc17582e7a44ec60ff201be74ae8310c0f34f1d4324cc7f6f
MD5 c0f615035939d4ef7ede7741fa89ce20
BLAKE2b-256 62741df3437282245cdf911f8b2bf8a2fef7b276fdc2192b63a8f1d475e0021d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page