Extract source file skeletons using tree-sitter queries
Project description
Loppers
Extract source file skeletons using tree-sitter queries.
Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a simple, fully-typed Python API.
Requires: tree-sitter >= 0.25
Features
- ✅ 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
- ✅ Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
- ✅ Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
- ✅ File Operations - Concatenate files/directories with binary detection
- ✅ Fully Typed - Complete type hints throughout
- ✅ CLI & Library - Use as command-line tool or Python library
Quick Start
Installation
# With uv (recommended)
uv pip install loppers
# With pip
pip install loppers
Extract Code Skeleton (Python API)
from loppers import extract
source = '''
def calculate(x: int, y: int) -> int:
"""Calculate sum."""
result = x + y
return result
'''
skeleton = extract(source, "python")
print(skeleton)
Output:
def calculate(x: int, y: int) -> int:
"""Calculate sum."""
Concatenate Files (Python API)
from loppers import concatenate_files
# Combine all text files with skeleton extraction
result = concatenate_files(
["src/", "tests/"],
recursive=True,
extract_skeletons=True,
verbose=True,
)
print(result)
Command Line Usage
# Extract single file
loppers myfile.py
# Process directory recursively
loppers -r src/ -o skeletons.txt
# Include original files (no extraction)
loppers -r src/ --no-extract
# Show progress
loppers -v -r .
Common CLI Examples:
# Multiple files
loppers file1.py file2.js file3.java
# Directory traversal
loppers src/ # Non-recursive
loppers -r src/ # Recursive
# Mix files and directories
loppers -r src/ tests/ docs/
# Save to file
loppers -r . -o combined.txt
# Verbose output
loppers -v -r src/
Examples: Before and After
Python Example
Before:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
self.name = name
self._setup()
def process(self, data):
"""Process data."""
result = []
for item in data:
result.append(item * 2)
return result
After:
class Calculator:
def __init__(self, name: str):
"""Initialize calculator."""
def process(self, data):
"""Process data."""
JavaScript/TypeScript Example
Before:
class UserService {
constructor(baseUrl: string) {
this.baseUrl = baseUrl;
this.cache = {};
}
async getUser(id: string) {
if (this.cache[id]) return this.cache[id];
const user = await fetch(this.baseUrl + '/' + id);
return user.json();
}
}
After:
class UserService {
constructor(baseUrl: string) {
}
async getUser(id: string) {
}
}
Java Example
Before:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
this.baseUrl = baseUrl;
this.validate();
}
public User getUserById(String id) {
Database db = new Database();
return db.query(id);
}
private void validate() {
if (baseUrl == null) {
throw new IllegalArgumentException("BaseUrl required");
}
}
}
After:
public class UserService {
private String baseUrl;
public UserService(String baseUrl) {
}
public User getUserById(String id) {
}
private void validate() {
}
}
Supported Languages
| Language | Features |
|---|---|
| Python | Functions, methods, __init__, @property, docstrings |
| JavaScript/TypeScript | Functions, arrow functions, methods, async/await |
| Java | Methods, constructors, static methods, annotations |
| Kotlin | Functions, methods, properties (getters/setters) |
| Go | Functions, methods, closures |
| Rust | Functions, methods, closures |
| C/C++ | Functions, methods, constructors |
| C# | Methods, properties (get/set), async/await |
| Ruby | Methods, singleton methods, blocks |
| PHP | Functions, methods, closures |
| Swift | Functions, methods, closures |
| Lua | Functions, local functions |
| Scala | Functions, methods, closures |
| Groovy | Functions, methods, closures |
| Objective-C | Methods, instance/class methods |
What Gets Preserved
- ✅ Function/method signatures
- ✅ Parameter types and defaults
- ✅ Return types
- ✅ Class definitions
- ✅ Import statements
- ✅ Comments
- ✅ Python docstrings
- ✅ Decorators
- ✅ Access modifiers (public, private, protected)
What Gets Removed
- ❌ Function/method bodies
- ❌ Local variable assignments
- ❌ Logic and implementation details
- ❌ Nested function implementations
Known Limitations
- Concise arrow functions (
const f = x => x * 2) - no body to remove - Python lambdas - no body to remove
- Some edge cases with getters/setters in JavaScript/TypeScript
API Reference
Skeleton Extraction
extract(source_code: str, language: str) -> str
Extract skeleton from source code.
Example:
from loppers import extract
code = "def hello(): print('hi')"
skeleton = extract(code, "python") # Returns: "def hello():"
SkeletonExtractor(language: str)
Create a language-specific extractor for reuse.
Example:
from loppers import SkeletonExtractor
extractor = SkeletonExtractor("python")
skeleton1 = extractor.extract(code1)
skeleton2 = extractor.extract(code2)
File Operations
concatenate_files(file_paths, recursive=False, verbose=False, extract_skeletons=True) -> str
Concatenate files with optional skeleton extraction.
Args:
file_paths- List of file and/or directory pathsrecursive- Recursively traverse directories (default: False)verbose- Print progress to stderr (default: False)extract_skeletons- Extract code skeletons (default: True)
Returns: Concatenated content with --- headers separating files
Example:
from loppers import concatenate_files
result = concatenate_files(
["src/", "tests/"],
recursive=True,
extract_skeletons=True,
)
# Automatically skips binary files
# Extracts skeletons for supported languages
# Includes original content for unsupported types
collect_files(paths, recursive=False, verbose=False, include_all_text_files=True) -> list[Path]
Collect text files from paths, excluding binary files.
Returns: Sorted list of file paths
is_binary_file(file_path: Path) -> bool
Detect if a file is binary.
Uses the binaryornot library which checks:
- Known binary file extensions
- Null bytes in content
- UTF-8 decoding success
Example:
from loppers import is_binary_file
from pathlib import Path
if not is_binary_file(Path("image.jpg")):
process_as_text()
Utility Functions
get_language(extension: str) -> str | None
Get language identifier from file extension.
Example:
from loppers import get_language
get_language(".py") # Returns: "python"
get_language(".java") # Returns: "java"
get_language(".txt") # Returns: None
How It Works
Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:
- Function/method signatures
- Class and interface definitions
- Import statements
- Python docstrings
- Comments
- Decorators
- Type hints
Language-Specific Queries:
Each language has a custom tree-sitter query pattern:
# Python: Remove function body but keep docstrings
"(function_definition body: (block) @body)"
# JavaScript: Handle multiple function types
"[(function_declaration body: ...) (arrow_function body: ...) ...]"
# Kotlin: Extract functions and property getters/setters
"[(function_declaration (function_body) @body) (getter ...) (setter ...)]"
Development
Setup
# Install with dev dependencies
uv sync --extra dev
Running Tests
# All tests
uv run pytest
# Verbose output
uv run pytest -v
# With coverage
uv run pytest --cov=loppers --cov-report=html
# Specific test
uv run pytest tests/test_loppers.py::TestSkeletonExtractor::test_python_extraction
Code Quality
# Check and fix
uv run ruff check . --fix
# Format
uv run ruff format .
# All checks at once
uv run ruff check . --fix && uv run ruff format .
Publishing to PyPI
This project uses Python Semantic Release with conventional commits.
Commit message format:
feat:- New feature (bumps minor version)fix:- Bug fix (bumps patch version)BREAKING CHANGE:- Major version bumpdocs:,chore:,refactor:- No version bump
Automated release:
# Push to main with conventional commit messages
# GitHub Actions will automatically:
# 1. Analyze commits
# 2. Bump version
# 3. Build distributions
# 4. Publish to PyPI
Manual publishing:
# Build locally
uv run python -m build
# View distributions
ls -lh dist/
Adding New Languages
To add support for a new language:
-
Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies
-
Add to LANGUAGE_CONFIGS in
src/loppers/loppers.py:LANGUAGE_CONFIGS["mylang"] = LanguageConfig( name="mylang", body_query="(function_definition body: (block) @body)", )
-
Add file extensions to
src/loppers/mapping.py:EXTENSION_TO_LANGUAGE = { ".ml": "mylang", ".mli": "mylang", }
-
Write a test in
tests/test_loppers.py:def test_mylang_extraction(self): code = "fun hello() { print('hi') }" skeleton = extract(code, "mylang") self.assertIn("fun hello()", skeleton) self.assertNotIn("print", skeleton)
-
Add example file in
examples/sample.mlshowcasing language features
Project Structure
loppers/
├── src/loppers/
│ ├── __init__.py # Public API exports
│ ├── loppers.py # Core extraction logic
│ ├── concatenator.py # File concatenation
│ ├── mapping.py # Language mapping
│ └── cli.py # Command-line interface
├── tests/
│ └── test_loppers.py # Unit tests (24 tests)
├── examples/
│ ├── sample.py # Python examples
│ ├── sample.kt # Kotlin examples
│ └── ... # Other language samples
├── pyproject.toml # Project configuration
├── README.md # Main documentation (this file)
├── CHANGELOG.md # Release history
└── CLAUDE.md # Claude Code development guide
Dependencies
Runtime:
tree-sitter>=0.25.0- AST parsing librarytree-sitter-language-pack>=0.10.0- Language grammarsbinaryornot>=0.4.4- Binary file detection
Development:
pytest>=7.0.0- Testing frameworkruff>=0.1.0- Linting and formattingpython-semantic-release>=8.0.0- Release automation
References
- tree-sitter Documentation
- tree-sitter-language-pack
- binaryornot Library
- uv Package Manager
- Conventional Commits
License
MIT - See LICENSE file for details
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loppers-1.3.2.tar.gz.
File metadata
- Download URL: loppers-1.3.2.tar.gz
- Upload date:
- Size: 95.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
334921e127cef6da5d20208d2c491f68beaeb2338afe3c19acb08eba9f69b897
|
|
| MD5 |
72847bf4036700921ed31831bd6b8469
|
|
| BLAKE2b-256 |
4f34533ba14eed351e1a1928c8564dc6a0fae5c83b10e1eea046aadabce5775c
|
Provenance
The following attestation bundles were made for loppers-1.3.2.tar.gz:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-1.3.2.tar.gz -
Subject digest:
334921e127cef6da5d20208d2c491f68beaeb2338afe3c19acb08eba9f69b897 - Sigstore transparency entry: 618681764
- Sigstore integration time:
-
Permalink:
undo76/loppers@0f355e3af3c48bcdf5c2ea2965ecead491da6923 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0f355e3af3c48bcdf5c2ea2965ecead491da6923 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file loppers-1.3.2-py3-none-any.whl.
File metadata
- Download URL: loppers-1.3.2-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e820560888948e1fa48396c4c5af69bd9009cae9b47312305a884f406aaa724
|
|
| MD5 |
143b59fdabdca18c508cb2a0f38a5929
|
|
| BLAKE2b-256 |
fc5969f67f994d1a4e54ff30b1d7384f3d948b56415faca3de38508d201df72c
|
Provenance
The following attestation bundles were made for loppers-1.3.2-py3-none-any.whl:
Publisher:
publish.yml on undo76/loppers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
loppers-1.3.2-py3-none-any.whl -
Subject digest:
5e820560888948e1fa48396c4c5af69bd9009cae9b47312305a884f406aaa724 - Sigstore transparency entry: 618681772
- Sigstore integration time:
-
Permalink:
undo76/loppers@0f355e3af3c48bcdf5c2ea2965ecead491da6923 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/undo76
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0f355e3af3c48bcdf5c2ea2965ecead491da6923 -
Trigger Event:
workflow_dispatch
-
Statement type: