Extract source file skeletons using tree-sitter queries
Project description
Loppers
Extract source file skeletons using tree-sitter queries.
Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a clean, fully-typed Python API and comprehensive CLI.
Requires: tree-sitter >= 0.25
Features
- ✅ 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
- ✅ Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
- ✅ Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
- ✅ All File Types - Process any non-binary text files (code, markdown, JSON, YAML, etc.)
- ✅ Binary Detection - Automatically skips binary files
- ✅ Ignore Patterns - Built-in + custom .gitignore support
- ✅ Fully Typed - Complete type hints throughout
- ✅ CLI & Library - Use as command-line tool or Python library
Quick Start
Installation
# With uv (recommended)
uv pip install loppers
# With pip
pip install loppers
Python API
The public API consists of 5 core functions:
1. extract_skeleton(source: str, language: str) -> str
Extract skeleton from source code by language identifier.
from loppers import extract_skeleton
code = """
def calculate(x: int, y: int) -> int:
'''Calculate sum.'''
result = x + y
return result
"""
skeleton = extract_skeleton(code, "python")
print(skeleton)
Output:
def calculate(x: int, y: int) -> int:
'''Calculate sum.'''
2. get_skeleton(file_path: Path | str, *, add_header: bool = False) -> str
Extract skeleton from a file by auto-detecting language from extension.
from loppers import get_skeleton
skeleton = get_skeleton("src/main.py")
print(skeleton)
# With header showing file path
skeleton = get_skeleton("src/main.py", add_header=True)
# Output: "--- /path/to/src/main.py\n..."
Raises:
FileNotFoundError- If file doesn't existValueError- If file language is unsupported
3. find_files(root: str | Path, *, recursive: bool = True, ignore_patterns: Sequence[str] | None = None, use_default_ignore: bool = True, respect_gitignore: bool = True) -> list[str]
Collect all non-binary text files from a root directory.
from loppers import find_files
# Find all text files in src/ recursively (default)
files = find_files("src/")
# Returns file paths relative to root:
# ['main.py', 'utils.py', 'config.yaml', 'README.md']
# Non-recursive
files = find_files("src/", recursive=False)
# Custom ignore patterns (gitignore syntax)
files = find_files(
"src/",
ignore_patterns=["*.test.py", "venv/"],
use_default_ignore=True, # Still applies built-in patterns
respect_gitignore=True, # Still respects .gitignore
)
Features:
- Takes single root directory (not multiple paths)
- Returns file paths relative to root
- Automatically excludes binary files (images, archives, etc.)
- Respects
.gitignoreby default - Supports custom gitignore-style ignore patterns
- Built-in patterns exclude node_modules, .git, pycache, build artifacts, etc.
- Works with ALL non-binary text files (code, markdown, JSON, YAML, etc.)
4. get_tree(root: str | Path, *, recursive: bool = True, ignore_patterns: Sequence[str] | None = None, use_default_ignore: bool = True, respect_gitignore: bool = True, collapse_single_dirs: bool = False, show_sizes: bool = False) -> str
Display formatted directory tree from a root directory.
from loppers import get_tree
# Display tree of src/ directory recursively
tree = get_tree("src/")
print(tree)
# Non-recursive tree
tree = get_tree("src/", recursive=False)
# With custom ignore patterns
tree = get_tree("src/", ignore_patterns=["*.test.py"])
# Collapse deep single-child directories (useful for Java packages)
tree = get_tree("src/", collapse_single_dirs=True)
# Show file sizes in human-friendly format
tree = get_tree("src/", show_sizes=True)
# Combine multiple options
tree = get_tree("src/", collapse_single_dirs=True, show_sizes=True)
Output (with collapse_single_dirs=True and show_sizes=True):
.
├─ main/java/com/example
│ ├─ Source.java (2.3KB)
│ └─ Util.java (1.8KB)
└─ resources/config.yaml (512B)
5. concatenate_files(root: str | Path, file_paths: Sequence[str | Path], *, extract: bool = True, ignore_not_found: bool = False) -> str
Concatenate files with optional skeleton extraction. Useful when you already have a list of file paths that you want to combine.
from loppers import concatenate_files, find_files
# Get list of files to concatenate
root = "src/"
files = find_files(root)
# Concatenate with skeleton extraction (default)
result = concatenate_files(root, files)
print(result)
# Concatenate without extraction (include original content)
result = concatenate_files(root, files, extract=False)
# Ignore files that don't exist or can't be processed
result = concatenate_files(root, files, ignore_not_found=True)
Output format:
--- path/to/file.py
def calculate(x: int) -> int:
'''Calculate.'''
--- path/to/other.js
function process() {
}
Features:
- Files are concatenated with headers showing their relative paths
- Each file separated by newlines
- Automatically extracts skeletons from code files (unless
extract=False) - Falls back to original content for unsupported file types
- Can optionally ignore files that don't exist or can't be processed
Parameters:
root- Root directory (file paths are relative to this)file_paths- List of file paths (relative to root) to concatenateextract- Extract skeletons from code files (defaultTrue)ignore_not_found- Ignore files that cannot be found or processed (defaultFalse)
Raises:
FileNotFoundError- If root doesn't exist or a file is not found (whenignore_not_found=False)ValueError- If no file paths provided or no files could be processedNotADirectoryError- If root is not a directory
Utility Function
get_language(extension: str) -> str | None - Get language identifier from file extension.
from loppers import get_language
get_language(".py") # "python"
get_language(".js") # "javascript"
get_language(".json") # None (no extraction for data files)
Command-Line Interface
Loppers provides 4 subcommands for common tasks.
Basic Usage
loppers --version
loppers --help
1. extract - Extract skeleton from file or stdin
Extract a single file's skeleton:
# From file
loppers extract file.py
loppers extract file.py -o skeleton.py
# From stdin with explicit language
echo 'def foo(): pass' | loppers extract -l python
# Verbose output
loppers extract file.py -v
Options:
FILE- File to extract (omit for stdin)-l, --language- Language identifier (auto-detected from extension if FILE provided, required for stdin)-o, --output- Output file (default: stdout)-v, --verbose- Print status to stderr
2. concatenate - Concatenate files with optional skeleton extraction
Process root directory with automatic skeleton extraction:
# Recursive (default)
loppers concatenate src/
# Non-recursive
loppers concatenate --no-recursive src/
# Save to file
loppers concatenate src/ -o combined.txt
# Verbose with progress
loppers concatenate -v src/
# Include original files without extraction
loppers concatenate --no-extract src/
# Custom ignore patterns
loppers concatenate -I "*.test.py" -I "venv/" src/
# Disable default ignores
loppers concatenate --no-default-ignore src/
# Don't respect .gitignore
loppers concatenate --no-gitignore src/
Features:
- Processes a single root directory (paths relative to root)
- Includes ALL non-binary text files (code, markdown, JSON, YAML, etc.)
- Automatically extracts skeletons for supported code files
- Includes original content for unsupported file types (graceful degradation)
- Each file prefixed with
--- filepathheader (relative path) - Verbose mode shows extraction status for each file
Options:
root- Root directory to process (required)-o, --output- Output file (default: stdout)--no-extract- Include original files without extraction-I, --ignore-pattern- Add custom ignore pattern (gitignore syntax, can be used multiple times)--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Don't recursively traverse directories-v, --verbose- Print status to stderr
3. tree - Show directory tree of discovered files
Display a formatted tree of all discovered files:
# Recursive tree (default)
loppers tree src/
# Non-recursive
loppers tree --no-recursive src/
# Save tree to file
loppers tree src/ -o tree.txt
# With ignore patterns
loppers tree -I "*.test.py" src/
# Collapse deep single-child directories (useful for Java packages)
loppers tree --collapse-single-dirs src/
# Show file sizes in human-friendly format
loppers tree --show-sizes src/
# Combine multiple options
loppers tree --collapse-single-dirs --show-sizes src/
Options:
root- Root directory to process (required)-o, --output- Output file (default: stdout)-I, --ignore-pattern- Add custom ignore pattern--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Non-recursive tree--collapse-single-dirs- Collapse directories with single children (e.g.,java/com/examplebecomes one line)--show-sizes- Show file sizes in human-friendly format (e.g., "1.2KB", "5.0MB")-v, --verbose- Print status to stderr
Collapse Example:
Without collapse:
.
└─ src
└─ main
└─ java
└─ com
└─ example
├─ Source.java
└─ Util.java
With --collapse-single-dirs:
.
└─ src/main/java/com/example
├─ Source.java
└─ Util.java
4. files - List all discovered files
Print one discovered file per line (relative to root):
# List all files recursively (default)
loppers files src/
# Save list to file
loppers files src/ -o file_list.txt
# Non-recursive
loppers files --no-recursive src/
# With custom ignores
loppers files -I "*.md" src/
Options:
root- Root directory to process (required)-o, --output- Output file (default: stdout)-I, --ignore-pattern- Add custom ignore pattern--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Non-recursive listing-v, --verbose- Print status to stderr
Examples: Before and After
Python Example
Before:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
self.name = name
self._setup()
def process(self, data):
"""Process data."""
result = []
for item in data:
result.append(item * 2)
return result
After:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
def process(self, data):
"""Process data."""
JavaScript/TypeScript Example
Before:
class UserService {
constructor(baseUrl: string) {
this.baseUrl = baseUrl;
this.cache = {};
}
async getUser(id: string) {
if (this.cache[id]) return this.cache[id];
const user = await fetch(this.baseUrl + '/' + id);
return user.json();
}
}
After:
class UserService {
constructor(baseUrl: string) {
}
async getUser(id: string) {
}
}
Java Example
Before:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
this.baseUrl = baseUrl;
this.validate();
}
public User getUserById(String id) {
Database db = new Database();
return db.query(id);
}
private void validate() {
if (baseUrl == null) {
throw new IllegalArgumentException("BaseUrl required");
}
}
}
After:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
}
public User getUserById(String id) {
}
private void validate() {
}
}
Supported Languages
| Language | Features |
|---|---|
| Python | Functions, methods, __init__, @property, docstrings |
| JavaScript/TypeScript | Functions, arrow functions, methods, async/await |
| Java | Methods, constructors, static methods, annotations |
| Kotlin | Functions, methods, properties (getters/setters) |
| Go | Functions, methods, closures |
| Rust | Functions, methods, closures |
| C/C++ | Functions, methods, constructors |
| C# | Methods, properties (get/set), async/await |
| Ruby | Methods, singleton methods, blocks |
| PHP | Functions, methods, closures |
| Swift | Functions, methods, closures |
| Lua | Functions, local functions |
| Scala | Functions, methods, closures |
| Groovy | Functions, methods, closures |
| Objective-C | Methods, instance/class methods |
What Gets Preserved
- ✅ Function/method signatures
- ✅ Parameter types and defaults
- ✅ Return types
- ✅ Class definitions
- ✅ Import statements
- ✅ Comments
- ✅ Python docstrings
- ✅ Decorators
- ✅ Access modifiers (public, private, protected)
What Gets Removed
- ❌ Function/method bodies
- ❌ Local variable assignments
- ❌ Logic and implementation details
- ❌ Nested function implementations
Known Limitations
- Concise arrow functions (
const f = x => x * 2) - no body to remove - Python lambdas - no body to remove
- Some edge cases with getters/setters in JavaScript/TypeScript
How It Works
Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:
- Function/method signatures
- Class and interface definitions
- Import statements
- Python docstrings
- Comments
- Decorators
- Type hints
Each language has custom tree-sitter query patterns that capture function/method body nodes, which are then removed line-by-line.
Development
Setup
# Install with dev dependencies
uv sync
Running Tests
# All tests
uv run pytest
# Verbose output
uv run pytest -v
# With coverage
uv run pytest --cov=loppers --cov-report=html
# Specific test
uv run pytest tests/test_loppers.py::test_python_extraction
Code Quality
# Check and fix
uv run ruff check . --fix
# Format
uv run ruff format .
# All checks
uv run ruff check . --fix && uv run ruff format .
Adding New Languages
To add support for a new language:
-
Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies
-
Add to LANGUAGE_CONFIGS in
src/loppers/loppers.py:LANGUAGE_CONFIGS["mylang"] = LanguageConfig( name="mylang", body_query="(function_definition body: (block) @body)", )
-
Add file extensions to
src/loppers/extensions.py:EXTENSION_TO_LANGUAGE = { ".ml": "mylang", ".mli": "mylang", }
-
Write a test in
tests/test_loppers.py:def test_mylang_extraction(self): code = "fun hello() { print('hi') }" skeleton = extract_skeleton(code, "mylang") assert "fun hello()" in skeleton assert "print" not in skeleton
-
Run tests to verify everything works
Project Structure
loppers/
├── src/loppers/
│ ├── __init__.py # Public API: extract_skeleton, get_skeleton, find_files, get_tree, concatenate_files
│ ├── loppers.py # Core extraction logic with SkeletonExtractor class
│ ├── source_utils.py # Convenience API and file operations
│ ├── extensions.py # Language extension mapping
│ ├── ignore_patterns.py # Default ignore patterns
│ ├── mapping.py # Backwards compatibility re-exports
│ └── cli.py # Command-line interface (4 subcommands)
├── tests/
│ └── test_loppers.py # Unit tests (38 tests)
├── pyproject.toml # Project configuration
├── README.md # This file
└── CLAUDE.md # Development guide for Claude Code
Dependencies
Runtime:
tree-sitter>=0.25.0- AST parsing librarytree-sitter-language-pack>=0.10.0- Language grammarsbinaryornot>=0.4.4- Binary file detectionpathspec>=0.9.0- .gitignore pattern matching
Development:
pytest>=7.0.0- Testing frameworkpytest-cov>=4.0.0- Coverage reportingruff>=0.1.0- Linting and formatting
References
- tree-sitter Documentation
- tree-sitter-language-pack
- binaryornot Library
- pathspec Library
- uv Package Manager
License
MIT - See LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loppers-2.6.0.tar.gz.
File metadata
- Download URL: loppers-2.6.0.tar.gz
- Upload date:
- Size: 104.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c56ad0575c70f3289a808cbe62d15467be123c1b0ee7f9617544f77c9a09a005
|
|
| MD5 |
cec11bb07d9bce1f68a38c0794c0644a
|
|
| BLAKE2b-256 |
fb96b6fea469971a240fb69238a0aceac836f056c7cfbb67fc64dae56e39740f
|
Provenance
The following attestation bundles were made for loppers-2.6.0.tar.gz:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-2.6.0.tar.gz -
Subject digest:
c56ad0575c70f3289a808cbe62d15467be123c1b0ee7f9617544f77c9a09a005 - Sigstore transparency entry: 659194914
- Sigstore integration time:
-
Permalink:
undo76/loppers@79cc6459901b9d820be8540da89c5288614d1aab -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79cc6459901b9d820be8540da89c5288614d1aab -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file loppers-2.6.0-py3-none-any.whl.
File metadata
- Download URL: loppers-2.6.0-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcbd4b102dbc22a41624148d3a60256976bb2835c694d771e9fa228b81012aad
|
|
| MD5 |
0409f9586a2234e5e2764bc4e4058ba5
|
|
| BLAKE2b-256 |
b8f471ecbbac6c33a757881279c0179f2cf6968bfbc1dd349b4fd0edfdabfacf
|
Provenance
The following attestation bundles were made for loppers-2.6.0-py3-none-any.whl:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-2.6.0-py3-none-any.whl -
Subject digest:
bcbd4b102dbc22a41624148d3a60256976bb2835c694d771e9fa228b81012aad - Sigstore transparency entry: 659194922
- Sigstore integration time:
-
Permalink:
undo76/loppers@79cc6459901b9d820be8540da89c5288614d1aab -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@79cc6459901b9d820be8540da89c5288614d1aab -
Trigger Event:
workflow_dispatch
-
Statement type: