Extract source file skeletons using tree-sitter queries
Project description
Loppers
Extract source file skeletons using tree-sitter queries.
Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a clean, fully-typed Python API and comprehensive CLI.
Requires: tree-sitter >= 0.25
Features
- ✅ 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
- ✅ Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
- ✅ Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
- ✅ All File Types - Process any non-binary text files (code, markdown, JSON, YAML, etc.)
- ✅ Binary Detection - Automatically skips binary files
- ✅ Ignore Patterns - Built-in + custom .gitignore support
- ✅ Fully Typed - Complete type hints throughout
- ✅ CLI & Library - Use as command-line tool or Python library
Quick Start
Installation
# With uv (recommended)
uv pip install loppers
# With pip
pip install loppers
Python API
The public API consists of 4 core functions:
1. extract_skeleton(source: str, language: str) -> str
Extract skeleton from source code by language identifier.
from loppers import extract_skeleton
code = """
def calculate(x: int, y: int) -> int:
'''Calculate sum.'''
result = x + y
return result
"""
skeleton = extract_skeleton(code, "python")
print(skeleton)
Output:
def calculate(x: int, y: int) -> int:
'''Calculate sum.'''
2. get_skeleton(file_path: Path | str, *, add_header: bool = False) -> str
Extract skeleton from a file by auto-detecting language from extension.
from loppers import get_skeleton
skeleton = get_skeleton("src/main.py")
print(skeleton)
# With header showing file path
skeleton = get_skeleton("src/main.py", add_header=True)
# Output: "--- /path/to/src/main.py\n..."
Raises:
FileNotFoundError- If file doesn't existValueError- If file language is unsupported
3. find_files(paths: list[str], *, recursive: bool = True, ignore_patterns: Sequence[str] | None = None, use_default_ignore: bool = True, respect_gitignore: bool = True) -> list[Path]
Collect all non-binary text files from given paths/directories.
from loppers import find_files
# Find all text files in src/ and tests/ recursively
files = find_files(["src/", "tests/"])
# Non-recursive
files = find_files(["src/"], recursive=False)
# Custom ignore patterns (gitignore syntax)
files = find_files(
["src/"],
ignore_patterns=["*.test.py", "venv/"],
use_default_ignore=True, # Still applies built-in patterns
respect_gitignore=True, # Still respects .gitignore
)
Features:
- Automatically excludes binary files (images, archives, etc.)
- Respects
.gitignoreby default - Supports custom gitignore-style ignore patterns
- Built-in patterns exclude node_modules, .git, pycache, build artifacts, etc.
- Works with ALL non-binary text files (code, markdown, JSON, YAML, etc.)
4. get_tree(paths: list[str | Path]) -> str
Display formatted directory tree from list of file paths.
from loppers import get_tree, find_files
files = find_files(["src/"])
tree = get_tree(files)
print(tree)
Output:
.
├─ src/
│ ├─ main.py
│ ├─ utils.py
│ └─ config.yaml
└─ tests/
└─ test_main.py
Utility Function
get_language(extension: str) -> str | None - Get language identifier from file extension.
from loppers import get_language
get_language(".py") # "python"
get_language(".js") # "javascript"
get_language(".json") # None (no extraction for data files)
Command-Line Interface
Loppers provides 4 subcommands for common tasks.
Basic Usage
loppers --version
loppers --help
1. extract - Extract skeleton from file or stdin
Extract a single file's skeleton:
# From file
loppers extract file.py
loppers extract file.py -o skeleton.py
# From stdin with explicit language
echo 'def foo(): pass' | loppers extract -l python
# Verbose output
loppers extract file.py -v
Options:
FILE- File to extract (omit for stdin)-l, --language- Language identifier (auto-detected from extension if FILE provided, required for stdin)-o, --output- Output file (default: stdout)-v, --verbose- Print status to stderr
2. concatenate - Concatenate files with optional skeleton extraction
Process multiple files/directories with automatic skeleton extraction:
# Recursive (default)
loppers concatenate src/
# Non-recursive
loppers concatenate --no-recursive src/
# Save to file
loppers concatenate src/ tests/ -o combined.txt
# Verbose with progress
loppers concatenate -v src/
# Include original files without extraction
loppers concatenate --no-extract src/
# Custom ignore patterns
loppers concatenate -I "*.test.py" -I "venv/" src/
# Disable default ignores
loppers concatenate --no-default-ignore src/
# Don't respect .gitignore
loppers concatenate --no-gitignore src/
Features:
- Includes ALL non-binary text files (code, markdown, JSON, YAML, etc.)
- Automatically extracts skeletons for supported code files
- Includes original content for unsupported file types (graceful degradation)
- Each file prefixed with
--- filepathheader - Verbose mode shows extraction status for each file
Options:
paths- Files/directories to process (required)-o, --output- Output file (default: stdout)--no-extract- Include original files without extraction-I, --ignore-pattern- Add custom ignore pattern (gitignore syntax, can be used multiple times)--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Don't recursively traverse directories-v, --verbose- Print status to stderr
3. tree - Show directory tree of discovered files
Display a formatted tree of all discovered files:
# Recursive tree
loppers tree src/
# Non-recursive
loppers tree --no-recursive src/
# Save tree to file
loppers tree src/ -o tree.txt
# With ignore patterns
loppers tree -I "*.test.py" src/
Options:
paths- Directories to process (required)-o, --output- Output file (default: stdout)-I, --ignore-pattern- Add custom ignore pattern--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Non-recursive tree-v, --verbose- Print status to stderr
4. files - List all discovered files
Print one discovered file per line:
# List all files recursively
loppers files src/
# Save list to file
loppers files src/ -o file_list.txt
# Multiple directories
loppers files src/ tests/ docs/
# Non-recursive
loppers files --no-recursive src/
# With custom ignores
loppers files -I "*.md" src/
Options:
paths- Directories/files to process (required)-o, --output- Output file (default: stdout)-I, --ignore-pattern- Add custom ignore pattern--no-default-ignore- Disable built-in ignore patterns--no-gitignore- Don't respect .gitignore--no-recursive- Non-recursive listing-v, --verbose- Print status to stderr
Examples: Before and After
Python Example
Before:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
self.name = name
self._setup()
def process(self, data):
"""Process data."""
result = []
for item in data:
result.append(item * 2)
return result
After:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
def process(self, data):
"""Process data."""
JavaScript/TypeScript Example
Before:
class UserService {
constructor(baseUrl: string) {
this.baseUrl = baseUrl;
this.cache = {};
}
async getUser(id: string) {
if (this.cache[id]) return this.cache[id];
const user = await fetch(this.baseUrl + '/' + id);
return user.json();
}
}
After:
class UserService {
constructor(baseUrl: string) {
}
async getUser(id: string) {
}
}
Java Example
Before:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
this.baseUrl = baseUrl;
this.validate();
}
public User getUserById(String id) {
Database db = new Database();
return db.query(id);
}
private void validate() {
if (baseUrl == null) {
throw new IllegalArgumentException("BaseUrl required");
}
}
}
After:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
}
public User getUserById(String id) {
}
private void validate() {
}
}
Supported Languages
| Language | Features |
|---|---|
| Python | Functions, methods, __init__, @property, docstrings |
| JavaScript/TypeScript | Functions, arrow functions, methods, async/await |
| Java | Methods, constructors, static methods, annotations |
| Kotlin | Functions, methods, properties (getters/setters) |
| Go | Functions, methods, closures |
| Rust | Functions, methods, closures |
| C/C++ | Functions, methods, constructors |
| C# | Methods, properties (get/set), async/await |
| Ruby | Methods, singleton methods, blocks |
| PHP | Functions, methods, closures |
| Swift | Functions, methods, closures |
| Lua | Functions, local functions |
| Scala | Functions, methods, closures |
| Groovy | Functions, methods, closures |
| Objective-C | Methods, instance/class methods |
What Gets Preserved
- ✅ Function/method signatures
- ✅ Parameter types and defaults
- ✅ Return types
- ✅ Class definitions
- ✅ Import statements
- ✅ Comments
- ✅ Python docstrings
- ✅ Decorators
- ✅ Access modifiers (public, private, protected)
What Gets Removed
- ❌ Function/method bodies
- ❌ Local variable assignments
- ❌ Logic and implementation details
- ❌ Nested function implementations
Known Limitations
- Concise arrow functions (
const f = x => x * 2) - no body to remove - Python lambdas - no body to remove
- Some edge cases with getters/setters in JavaScript/TypeScript
How It Works
Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:
- Function/method signatures
- Class and interface definitions
- Import statements
- Python docstrings
- Comments
- Decorators
- Type hints
Each language has custom tree-sitter query patterns that capture function/method body nodes, which are then removed line-by-line.
Development
Setup
# Install with dev dependencies
uv sync
Running Tests
# All tests
uv run pytest
# Verbose output
uv run pytest -v
# With coverage
uv run pytest --cov=loppers --cov-report=html
# Specific test
uv run pytest tests/test_loppers.py::test_python_extraction
Code Quality
# Check and fix
uv run ruff check . --fix
# Format
uv run ruff format .
# All checks
uv run ruff check . --fix && uv run ruff format .
Adding New Languages
To add support for a new language:
-
Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies
-
Add to LANGUAGE_CONFIGS in
src/loppers/loppers.py:LANGUAGE_CONFIGS["mylang"] = LanguageConfig( name="mylang", body_query="(function_definition body: (block) @body)", )
-
Add file extensions to
src/loppers/extensions.py:EXTENSION_TO_LANGUAGE = { ".ml": "mylang", ".mli": "mylang", }
-
Write a test in
tests/test_loppers.py:def test_mylang_extraction(self): code = "fun hello() { print('hi') }" skeleton = extract_skeleton(code, "mylang") assert "fun hello()" in skeleton assert "print" not in skeleton
-
Run tests to verify everything works
Project Structure
loppers/
├── src/loppers/
│ ├── __init__.py # Public API: extract_skeleton, get_skeleton, find_files, get_tree
│ ├── loppers.py # Core extraction logic with SkeletonExtractor class
│ ├── source_utils.py # Convenience API and file operations
│ ├── extensions.py # Language extension mapping
│ ├── ignore_patterns.py # Default ignore patterns
│ ├── mapping.py # Backwards compatibility re-exports
│ └── cli.py # Command-line interface (4 subcommands)
├── tests/
│ └── test_loppers.py # Unit tests (31 tests)
├── pyproject.toml # Project configuration
├── README.md # This file
└── CLAUDE.md # Development guide for Claude Code
Dependencies
Runtime:
tree-sitter>=0.25.0- AST parsing librarytree-sitter-language-pack>=0.10.0- Language grammarsbinaryornot>=0.4.4- Binary file detectionpathspec>=0.9.0- .gitignore pattern matching
Development:
pytest>=7.0.0- Testing frameworkpytest-cov>=4.0.0- Coverage reportingruff>=0.1.0- Linting and formatting
References
- tree-sitter Documentation
- tree-sitter-language-pack
- binaryornot Library
- pathspec Library
- uv Package Manager
License
MIT - See LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loppers-2.0.0.tar.gz.
File metadata
- Download URL: loppers-2.0.0.tar.gz
- Upload date:
- Size: 100.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3dbab81b816ba7cf636b26955c0d10009eb8e1d7b881441e8580f8e6b8c9527d
|
|
| MD5 |
7b6094fb150c4aea8255ad849fa00233
|
|
| BLAKE2b-256 |
087fa154b97cf96fb7c20420ae925d99547b7a0b22a9033c0a6a2f6912fbae44
|
Provenance
The following attestation bundles were made for loppers-2.0.0.tar.gz:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-2.0.0.tar.gz -
Subject digest:
3dbab81b816ba7cf636b26955c0d10009eb8e1d7b881441e8580f8e6b8c9527d - Sigstore transparency entry: 618931589
- Sigstore integration time:
-
Permalink:
undo76/loppers@463c902b3c2bfdbae526f020a5352b70307750d8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@463c902b3c2bfdbae526f020a5352b70307750d8 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file loppers-2.0.0-py3-none-any.whl.
File metadata
- Download URL: loppers-2.0.0-py3-none-any.whl
- Upload date:
- Size: 17.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fdbb21c8321d3207b8c960231b8bccf019aac9f605c67137b2d9fd5fc94d679
|
|
| MD5 |
33ff5b63e5c56a1c41ea18fda59af485
|
|
| BLAKE2b-256 |
b2e5f305d84296a323ae81853f342c4b5829b750e89af8871c7177cefc26b045
|
Provenance
The following attestation bundles were made for loppers-2.0.0-py3-none-any.whl:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-2.0.0-py3-none-any.whl -
Subject digest:
8fdbb21c8321d3207b8c960231b8bccf019aac9f605c67137b2d9fd5fc94d679 - Sigstore transparency entry: 618931592
- Sigstore integration time:
-
Permalink:
undo76/loppers@463c902b3c2bfdbae526f020a5352b70307750d8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@463c902b3c2bfdbae526f020a5352b70307750d8 -
Trigger Event:
workflow_dispatch
-
Statement type: