Skip to main content

Extract source file skeletons using tree-sitter queries

Project description

Loppers

Extract source file skeletons using tree-sitter queries.

Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a simple, fully-typed Python API.

Requires: tree-sitter >= 0.25

Features

  • 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
  • Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
  • Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
  • File Operations - Concatenate files/directories with binary detection
  • Fully Typed - Complete type hints throughout
  • CLI & Library - Use as command-line tool or Python library

Quick Start

Installation

# With uv (recommended)
uv pip install loppers

# With pip
pip install loppers

Extract Code Skeleton (Python API)

from loppers import extract

source = '''
def calculate(x: int, y: int) -> int:
    """Calculate sum."""
    result = x + y
    return result
'''

skeleton = extract(source, "python")
print(skeleton)

Output:

def calculate(x: int, y: int) -> int:
    """Calculate sum."""

Concatenate Files (Python API)

from loppers import concatenate_files

# Combine all text files with skeleton extraction
result = concatenate_files(
    ["src/", "tests/"],
    recursive=True,
    extract_skeletons=True,
    verbose=True,
)
print(result)

Command Line Usage

# Extract single file
loppers myfile.py

# Process directory recursively
loppers -r src/ -o skeletons.txt

# Include original files (no extraction)
loppers -r src/ --no-extract

# Show progress
loppers -v -r .

Common CLI Examples:

# Multiple files
loppers file1.py file2.js file3.java

# Directory traversal
loppers src/                  # Non-recursive
loppers -r src/              # Recursive

# Mix files and directories
loppers -r src/ tests/ docs/

# Save to file
loppers -r . -o combined.txt

# Verbose output
loppers -v -r src/

Examples: Before and After

Python Example

Before:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""
        self.name = name
        self._setup()

    def process(self, data):
        """Process data."""
        result = []
        for item in data:
            result.append(item * 2)
        return result

After:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""

    def process(self, data):
        """Process data."""

JavaScript/TypeScript Example

Before:

class UserService {
    constructor(baseUrl: string) {
        this.baseUrl = baseUrl;
        this.cache = {};
    }

    async getUser(id: string) {
        if (this.cache[id]) return this.cache[id];
        const user = await fetch(this.baseUrl + '/' + id);
        return user.json();
    }
}

After:

class UserService {
    constructor(baseUrl: string) {
    }

    async getUser(id: string) {
    }
}

Java Example

Before:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
        this.baseUrl = baseUrl;
        this.validate();
    }

    public User getUserById(String id) {
        Database db = new Database();
        return db.query(id);
    }

    private void validate() {
        if (baseUrl == null) {
            throw new IllegalArgumentException("BaseUrl required");
        }
    }
}

After:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
    }

    public User getUserById(String id) {
    }

    private void validate() {
    }
}

Supported Languages

Language Features
Python Functions, methods, __init__, @property, docstrings
JavaScript/TypeScript Functions, arrow functions, methods, async/await
Java Methods, constructors, static methods, annotations
Kotlin Functions, methods, properties (getters/setters)
Go Functions, methods, closures
Rust Functions, methods, closures
C/C++ Functions, methods, constructors
C# Methods, properties (get/set), async/await
Ruby Methods, singleton methods, blocks
PHP Functions, methods, closures
Swift Functions, methods, closures
Lua Functions, local functions
Scala Functions, methods, closures
Groovy Functions, methods, closures
Objective-C Methods, instance/class methods

What Gets Preserved

  • ✅ Function/method signatures
  • ✅ Parameter types and defaults
  • ✅ Return types
  • ✅ Class definitions
  • ✅ Import statements
  • ✅ Comments
  • ✅ Python docstrings
  • ✅ Decorators
  • ✅ Access modifiers (public, private, protected)

What Gets Removed

  • ❌ Function/method bodies
  • ❌ Local variable assignments
  • ❌ Logic and implementation details
  • ❌ Nested function implementations

Known Limitations

  • Concise arrow functions (const f = x => x * 2) - no body to remove
  • Python lambdas - no body to remove
  • Some edge cases with getters/setters in JavaScript/TypeScript

API Reference

Skeleton Extraction

extract(source_code: str, language: str) -> str

Extract skeleton from source code.

Example:

from loppers import extract

code = "def hello(): print('hi')"
skeleton = extract(code, "python")  # Returns: "def hello():"

SkeletonExtractor(language: str)

Create a language-specific extractor for reuse.

Example:

from loppers import SkeletonExtractor

extractor = SkeletonExtractor("python")
skeleton1 = extractor.extract(code1)
skeleton2 = extractor.extract(code2)

File Operations

concatenate_files(file_paths, recursive=False, verbose=False, extract_skeletons=True) -> str

Concatenate files with optional skeleton extraction.

Args:

  • file_paths - List of file and/or directory paths
  • recursive - Recursively traverse directories (default: False)
  • verbose - Print progress to stderr (default: False)
  • extract_skeletons - Extract code skeletons (default: True)

Returns: Concatenated content with --- headers separating files

Example:

from loppers import concatenate_files

result = concatenate_files(
    ["src/", "tests/"],
    recursive=True,
    extract_skeletons=True,
)
# Automatically skips binary files
# Extracts skeletons for supported languages
# Includes original content for unsupported types

collect_files(paths, recursive=False, verbose=False, include_all_text_files=True) -> list[Path]

Collect text files from paths, excluding binary files.

Returns: Sorted list of file paths

is_binary_file(file_path: Path) -> bool

Detect if a file is binary.

Uses the binaryornot library which checks:

  • Known binary file extensions
  • Null bytes in content
  • UTF-8 decoding success

Example:

from loppers import is_binary_file
from pathlib import Path

if not is_binary_file(Path("image.jpg")):
    process_as_text()

Utility Functions

get_language(extension: str) -> str | None

Get language identifier from file extension.

Example:

from loppers import get_language

get_language(".py")   # Returns: "python"
get_language(".java") # Returns: "java"
get_language(".txt")  # Returns: None

How It Works

Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:

  • Function/method signatures
  • Class and interface definitions
  • Import statements
  • Python docstrings
  • Comments
  • Decorators
  • Type hints

Language-Specific Queries:

Each language has a custom tree-sitter query pattern:

# Python: Remove function body but keep docstrings
"(function_definition body: (block) @body)"

# JavaScript: Handle multiple function types
"[(function_declaration body: ...) (arrow_function body: ...) ...]"

# Kotlin: Extract functions and property getters/setters
"[(function_declaration (function_body) @body) (getter ...) (setter ...)]"

Development

Setup

# Install with dev dependencies
uv sync --extra dev

Running Tests

# All tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage
uv run pytest --cov=loppers --cov-report=html

# Specific test
uv run pytest tests/test_loppers.py::TestSkeletonExtractor::test_python_extraction

Code Quality

# Check and fix
uv run ruff check . --fix

# Format
uv run ruff format .

# All checks at once
uv run ruff check . --fix && uv run ruff format .

Publishing to PyPI

This project uses Python Semantic Release with conventional commits.

Commit message format:

  • feat: - New feature (bumps minor version)
  • fix: - Bug fix (bumps patch version)
  • BREAKING CHANGE: - Major version bump
  • docs:, chore:, refactor: - No version bump

Automated release:

# Push to main with conventional commit messages
# GitHub Actions will automatically:
# 1. Analyze commits
# 2. Bump version
# 3. Build distributions
# 4. Publish to PyPI

Manual publishing:

# Build locally
uv run python -m build

# View distributions
ls -lh dist/

Adding New Languages

To add support for a new language:

  1. Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies

  2. Add to LANGUAGE_CONFIGS in src/loppers/loppers.py:

    LANGUAGE_CONFIGS["mylang"] = LanguageConfig(
        name="mylang",
        body_query="(function_definition body: (block) @body)",
    )
    
  3. Add file extensions to src/loppers/mapping.py:

    EXTENSION_TO_LANGUAGE = {
        ".ml": "mylang",
        ".mli": "mylang",
    }
    
  4. Write a test in tests/test_loppers.py:

    def test_mylang_extraction(self):
        code = "fun hello() { print('hi') }"
        skeleton = extract(code, "mylang")
        self.assertIn("fun hello()", skeleton)
        self.assertNotIn("print", skeleton)
    
  5. Add example file in examples/sample.ml showcasing language features

Project Structure

loppers/
├── src/loppers/
│   ├── __init__.py              # Public API exports
│   ├── loppers.py               # Core extraction logic
│   ├── concatenator.py          # File concatenation
│   ├── mapping.py               # Language mapping
│   └── cli.py                   # Command-line interface
├── tests/
│   └── test_loppers.py          # Unit tests (24 tests)
├── examples/
│   ├── sample.py                # Python examples
│   ├── sample.kt                # Kotlin examples
│   └── ...                       # Other language samples
├── pyproject.toml               # Project configuration
├── README.md                    # Main documentation (this file)
├── CHANGELOG.md                 # Release history
└── CLAUDE.md                    # Claude Code development guide

Dependencies

Runtime:

  • tree-sitter>=0.25.0 - AST parsing library
  • tree-sitter-language-pack>=0.10.0 - Language grammars
  • binaryornot>=0.4.4 - Binary file detection

Development:

  • pytest>=7.0.0 - Testing framework
  • ruff>=0.1.0 - Linting and formatting
  • python-semantic-release>=8.0.0 - Release automation

References

License

MIT - See LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loppers-1.3.0.tar.gz (94.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loppers-1.3.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file loppers-1.3.0.tar.gz.

File metadata

  • Download URL: loppers-1.3.0.tar.gz
  • Upload date:
  • Size: 94.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-1.3.0.tar.gz
Algorithm Hash digest
SHA256 2c61557df26b7ec428d211745d899061da91d408b2bb955fe97c0fa6176f21c8
MD5 67a968f78209f006a2a020a6ae7a071a
BLAKE2b-256 cbf0532bdeec74969a946f2fa320c6ee9a91063323042a3144ff2ab79f7c1ec7

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-1.3.0.tar.gz:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file loppers-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: loppers-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f8b3819014143adb205c71e4f529c03913f6a423853cc23babffa968f8452e9b
MD5 1cf119735225687fb7075a46e1f74cbb
BLAKE2b-256 045ddcc625a49cf1a3677a25765f236f91d7427068956ba4a57ab7655f872c2e

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-1.3.0-py3-none-any.whl:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page