AI-era enterprise-grade code analysis tool with comprehensive HTML/CSS support, dynamic plugin architecture, and MCP integration
Project description
๐ณ Tree-sitter Analyzer
English | ๆฅๆฌ่ช | ็ฎไฝไธญๆ
Tree-Sitter-Analyzer is a local-first code context engine for AI-assisted development โ combining fast repository retrieval, AST-based structural analysis, and secure MCP integration.
Its job is not just to parse code. Its job is to help humans and AI agents fetch only the code context they actually need, safely, quickly, and with structural precision.
find the right files โ find the right matches โ extract the right structure โ send only the right context
17 languages ยท Project-boundary security ยท Claude Desktop / Cursor / Roo Code ยท CLI + Python API
โจ What's New in v1.10.5
get_code_outlineMCP tool with TOON format: Outline-first navigation delivering 54-56% token reduction vs JSON. Retrieve hierarchical structure first, then fetch only the bodies you need.trace_impactMCP tool: Lightweight call site finder using ripgrep โ impact analysis without graph database overhead- Intent-based tool aliases: AI-friendly tool naming (
locate_usage,map_structure) makes tool discovery natural for agents - Analysis session tracking: Audit multi-step SMART workflows with session IDs and operation history
- 23 critical bug fixes: TOON format return structure, default output format, test assertions - project fully operational
- Measured token savings: Real-world testing shows TOON format reduces output size by 54-56% across small/medium/large files
- Enhanced test coverage: 8,890 tests (100% pass)
- Cross-platform verified: All tests pass on Ubuntu, Windows, macOS ร Python 3.10-3.13
๐ Full Changelog for complete version history.
๐ฌ See It In Action
Demo GIF coming soon - showcasing AI integration with SMART workflow
๐ฏ Why Tree-sitter Analyzer
Tree-sitter Analyzer is an open-source, local-first code context engine for helping AI assistants read only what matters in large codebases.
- Minimal context, not whole-file stuffing: retrieve the smallest useful code regions before sending them to AI
- Evidence-based analysis: combine tree-sitter structure with
fdandripgrepto surface relevant files, symbols, and paths - No heavy preprocessing required: useful on messy repositories where full indexing can be slow, stale, or difficult to maintain
Common Use Cases
- Understand what a very large file or module is doing without loading the entire file into an AI prompt
- Trace business logic, UI handlers, or bug-related code paths across a complex repository
- Narrow AI context for Java and other large codebases before asking for analysis or changes
๐ 5-Minute Quick Start
Prerequisites
# Install uv (required)
# macOS/Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows PowerShell
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# Install fd + ripgrep (required for search features)
brew install fd ripgrep # macOS
winget install sharkdp.fd BurntSushi.ripgrep.MSVC # Windows
๐ Detailed Installation Guide for all platforms.
Verify Installation
uv run tree-sitter-analyzer --show-supported-languages
๐ค AI Integration
Configure your AI assistant to use Tree-sitter Analyzer via MCP protocol.
This works especially well when your assistant struggles with very large files, noisy repository-wide context, or legacy code that is too expensive to load all at once.
Claude Desktop / Cursor / Roo Code
Add to your MCP configuration:
{
"mcpServers": {
"tree-sitter-analyzer": {
"command": "uvx",
"args": [
"--from", "tree-sitter-analyzer[mcp]",
"tree-sitter-analyzer-mcp"
],
"env": {
"TREE_SITTER_PROJECT_ROOT": "/path/to/your/project",
"TREE_SITTER_OUTPUT_PATH": "/path/to/output/directory"
}
}
}
}
Configuration file locations:
- Claude Desktop:
%APPDATA%\Claude\claude_desktop_config.json(Windows) /~/Library/Application Support/Claude/claude_desktop_config.json(macOS) - Cursor: Built-in MCP settings
- Roo Code: MCP configuration
After restart, tell the AI: Please set the project root directory to: /path/to/your/project
๐ MCP Tools Reference for complete API documentation.
๐ป Common CLI Commands
Installation
uv add "tree-sitter-analyzer[all,mcp]" # Full installation
Top 5 Commands
# 1. Analyze file structure
uv run tree-sitter-analyzer examples/BigService.java --table full
# 2. Quick summary
uv run tree-sitter-analyzer examples/BigService.java --summary
# 3. Extract code section
uv run tree-sitter-analyzer examples/BigService.java --partial-read --start-line 93 --end-line 106
# 4. Find files and search content
uv run find-and-grep --roots . --query "class.*Service" --extensions java
# 5. Query specific elements
uv run tree-sitter-analyzer examples/BigService.java --query-key methods --filter "public=true"
๐ View Output Example
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ BigService.java Analysis โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Total Lines: 1419 | Code: 906 | Comments: 246 | Blank: 267 โ
โ Classes: 1 | Methods: 66 | Fields: 9 | Complexity: 5.27 avg โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
๐ Complete CLI Reference for all commands and options.
๐ Supported Languages
| Language | Support Level | Key Features |
|---|---|---|
| Java | โ Complete | Spring, JPA, enterprise features |
| Python | โ Complete | Type annotations, decorators |
| TypeScript | โ Complete | Interfaces, types, TSX/JSX |
| JavaScript | โ Complete | ES6+, React/Vue/Angular |
| C | โ Complete | Functions, structs, unions, enums, preprocessor |
| C++ | โ Complete | Classes, templates, namespaces, inheritance |
| C# | โ Complete | Records, async/await, attributes |
| SQL | โ Enhanced | Tables, views, procedures, triggers |
| HTML | โ Complete | DOM structure, element classification |
| CSS | โ Complete | Selectors, properties, categorization |
| Go | โ Complete | Structs, interfaces, goroutines |
| Rust | โ Complete | Traits, impl blocks, macros |
| Kotlin | โ Complete | Data classes, coroutines |
| PHP | โ Complete | PHP 8+, attributes, traits |
| Ruby | โ Complete | Rails patterns, metaprogramming |
| YAML | โ Complete | Anchors, aliases, multi-document |
| Markdown | โ Complete | Headers, code blocks, tables |
๐ Features Documentation for language-specific details.
๐ Features Overview
| Feature | Description | Learn More |
|---|---|---|
| SMART Workflow | Set-Map-Analyze-Retrieve-Trace methodology | Guide |
| Outline-First Navigation | get_code_outline โ hierarchical structure map before content retrieval |
MCP Tools |
| MCP Protocol | Native AI assistant integration | API Docs |
| Token Optimization | TOON format delivers 54-56% token reduction; token-aware controls for large AI workflows | Features |
| File Search | fd-based high-performance discovery | CLI Reference |
| Content Search | ripgrep regex search | CLI Reference |
| Security | Project boundary protection | Architecture |
๐ฌ Grammar Coverage (MECE Framework)
Tree-sitter Analyzer guarantees zero False Positives in grammar coverage validation across all 17 supported languages.
Phase 1: MECE Architecture (2026-03)
New Architecture:
- Tracks syntactic paths
(node_type, parent_path)instead of just node types - Uses exact node identity matching (type + byte range + parent chain + file path)
- Eliminates nested node misclassification (wrapper nodes no longer cause False Positives)
Why It Matters:
# OLD method: Position overlap โ False Positives
@decorator # Plugin extracts this
def foo(): # Validator incorrectly marks this as "covered" (it's not!)
pass
# NEW method: Exact identity matching โ Zero False Positives
# Only nodes actually extracted by the plugin are marked as covered
Validation Commands
# Validate single language
python -c "from tree_sitter_analyzer.grammar_coverage.validator import validate_plugin_coverage_sync; r = validate_plugin_coverage_sync('python'); print(f'{r.coverage_percentage:.1f}% coverage')"
# Validate all languages
python -c "
from tree_sitter_analyzer.grammar_coverage.validator import validate_plugin_coverage_sync
langs = ['python', 'javascript', 'java', 'go', 'typescript', 'c', 'cpp', 'rust', 'ruby', 'php', 'kotlin', 'swift', 'scala', 'bash', 'yaml', 'json', 'sql']
for lang in langs:
r = validate_plugin_coverage_sync(lang)
status = 'โ
' if r.coverage_percentage == 100.0 else 'โ'
print(f'{status} {lang}: {r.coverage_percentage:.1f}% ({r.covered_node_types}/{r.total_node_types})')
"
Example Output (new format):
โ
python: 100.0% (57/57 node types covered)
โ
javascript: 100.0% (58/58 node types covered)
โ
typescript: 100.0% (114/114 node types covered)
...
โ
sql: 100.0% (155/155 node types covered)
MECE Guarantees
- Mutually Exclusive: Each node has a unique
(type, parent_path)โ no double counting - Collectively Exhaustive: Full AST traversal โ no missing nodes
- Zero False Positives: Exact matching โ only truly extracted nodes marked as covered
๐ Grammar Coverage Framework for technical details and architecture.
๐ Quality & Testing
| Metric | Value |
|---|---|
| Tests | Multi-thousand automated tests |
| Coverage | |
| Type Safety | 100% mypy compliance |
| Platforms | Windows, macOS, Linux |
# Run tests
uv run pytest tests/ -v
# Generate coverage report
uv run pytest tests/ --cov=tree_sitter_analyzer --cov-report=html
๐ Security & Architecture
Tree-sitter Analyzer is designed with security-by-default principles for AI-assisted development workflows.
Security Model
Project Boundary Enforcement
- All MCP tools validate file paths against project root boundaries
- No access to files outside the configured project directory
- Symlink traversal prevention
- Path normalization prevents
../escape attempts
Input Validation
- JSON Schema validation on all MCP tool parameters
- Type-safe Python API with strict mypy compliance
- Sanitized user inputs before shell command execution
- Pattern validation for glob/regex searches
No Remote Execution
- 100% local processing โ no cloud dependencies
- No telemetry or data collection
- No network calls except optional PyPI version checks
- Source code analysis stays on your machine
Secure Defaults
- Read-only file operations by default
- Explicit opt-in required for any file modifications
- Sandboxed subprocess execution for external tools (fd, ripgrep)
- Environment variable isolation
Architecture Principles
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ AI Assistant (Claude Desktop / Cursor / Roo Code) โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Protocol (JSON-RPC)
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ MCP Server Layer โ
โ โข Input validation (JSON Schema) โ
โ โข Project boundary checks โ
โ โข Tool dispatch โ
โโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Analysis Engine โ
โ โข Tree-sitter AST parsing (17 languages) โ
โ โข Fast file search (fd) โ
โ โข Content search (ripgrep) โ
โ โข Output formatting (JSON / TOON) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key Security Boundaries:
- MCP Protocol: AI can only call explicitly defined tools with validated schemas
- Project Root: File operations confined to configured directory
- Read-Only: No destructive operations without explicit user consent
- Local-First: All processing happens on your machine
Security Testing
- 8,890+ automated tests including security-focused edge cases
- 100% mypy type safety prevents entire classes of bugs
- CI/CD security scans: Bandit (Python security), safety (dependency vulnerabilities)
- Manual security review of all MCP tool implementations
Reporting Security Issues
Found a security concern? Please email aimasteracc@gmail.com or open a private security advisory on GitHub.
We do NOT use automated security badge services โ our security posture is documented through architecture, testing, and code review, not third-party scores.
๐ ๏ธ Development
Setup
git clone https://github.com/aimasteracc/tree-sitter-analyzer.git
cd tree-sitter-analyzer
uv sync --extra all --extra mcp
Quality Checks
uv run pytest tests/ -v # Run tests
uv run python check_quality.py --new-code-only # Quality check
uv run python llm_code_checker.py --check-all # AI code check
๐ Architecture Guide for system design details.
๐ค Contributing & License
We welcome contributions! See Contributing Guide for development guidelines.
โญ Support
If this project helps you, please give us a โญ on GitHub!
๐ Sponsors
@o93 - Lead Sponsor supporting MCP tool enhancement, test infrastructure, and quality improvements.
๐ License
MIT License - see LICENSE file.
๐งช Testing
Test Coverage
| Metric | Value |
|---|---|
| Test Suite | Multi-thousand automated tests across unit, integration, regression, property, benchmark, and compatibility layers |
| Code Coverage | |
| Type Safety | 100% mypy compliance |
Running Tests
# Run all tests
uv run pytest tests/ -v
# Run specific test category
uv run pytest tests/unit/ -v # Unit tests
uv run pytest tests/integration/ -v # Integration tests
uv run pytest tests/regression/ -m regression # Regression tests
uv run pytest tests/benchmarks/ -v # Benchmark tests
# Run with coverage
uv run pytest tests/ --cov=tree_sitter_analyzer --cov-report=html
# Run property-based tests
uv run pytest tests/property/
# Run performance benchmarks
uv run pytest tests/benchmarks/ --benchmark-only
Test Documentation
| Document | Description |
|---|---|
| Test Writing Guide | Comprehensive guide for writing tests |
| Regression Testing Guide | Golden Master methodology and regression testing |
| Testing Documentation | Project testing standards |
Test Categories
- Unit Tests: Test individual components in isolation
- Integration Tests: Test component interactions
- Regression Tests: Ensure backward compatibility and format stability
- Property Tests: Use Hypothesis-based invariant checking
- Benchmark Tests: Track performance and regression signals
- Compatibility Tests: Validate cross-version behavior
CI/CD Integration
- Test Coverage Workflow: Automated coverage checks on PRs and pushes
- Regression Tests Workflow: Golden Master validation and format stability checks
- Performance Benchmarks: Daily benchmark runs with trend analysis
- Quality Checks: Automated linting, type checking, and security scanning
Contributing Tests
When contributing new features:
- Write Tests: Follow the Test Writing Guide
- Ensure Coverage: Maintain >80% code coverage
- Run Locally:
uv run pytest tests/ -v - Check Quality:
uv run ruff check . && uv run mypy tree_sitter_analyzer/ - Update Docs: Document new tests and features
๐ Documentation
| Document | Description |
|---|---|
| Installation Guide | Setup for all platforms |
| CLI Reference | Complete command reference |
| SMART Workflow | AI-assisted analysis guide |
| MCP Tools API | MCP integration details |
| Features | Language support details |
| Architecture | System design |
| Contributing | Development guidelines |
| Test Writing Guide | Comprehensive test writing guide |
| Regression Testing Guide | Golden Master methodology |
| Changelog | Version history |
๐ฏ Built for developers working with large codebases and AI assistants
Making every line of code understandable to AI, enabling every project to break through token limitations
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tree_sitter_analyzer-1.10.8.tar.gz.
File metadata
- Download URL: tree_sitter_analyzer-1.10.8.tar.gz
- Upload date:
- Size: 1.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a56e858ec15e00768f3687ab33db4cb8b932f4dcd2adb9eec11eda8eed9643d
|
|
| MD5 |
6656dc58143ca17480fae11ff05bb41d
|
|
| BLAKE2b-256 |
ade54fd97dabe4adad8d7ddff5156dcfc60662f7e3d01805522ed701efa66e5a
|
File details
Details for the file tree_sitter_analyzer-1.10.8-py3-none-any.whl.
File metadata
- Download URL: tree_sitter_analyzer-1.10.8-py3-none-any.whl
- Upload date:
- Size: 675.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8db7d950a81a43532ff1822fa0b5da04bfc715486030394fb1ac54f4ff4e0bf1
|
|
| MD5 |
5535d195b9bb0cd493b7c4bb6136c11e
|
|
| BLAKE2b-256 |
3b9f8b3f8359f7cf38ee91d03b1530687e8f626bac4bb8db0e1986e4fc560e03
|