Skip to main content

Extract source file skeletons using tree-sitter queries

Project description

Loppers

Extract source file skeletons using tree-sitter queries.

Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a simple, fully-typed Python API.

Requires: tree-sitter >= 0.25

Features

  • 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
  • Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
  • Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
  • File Operations - Concatenate files/directories with binary detection
  • Fully Typed - Complete type hints throughout
  • CLI & Library - Use as command-line tool or Python library

Quick Start

Installation

# With uv (recommended)
uv pip install loppers

# With pip
pip install loppers

Extract Code Skeleton (Python API)

from loppers import extract

source = '''
def calculate(x: int, y: int) -> int:
    """Calculate sum."""
    result = x + y
    return result
'''

skeleton = extract(source, "python")
print(skeleton)

Output:

def calculate(x: int, y: int) -> int:
    """Calculate sum."""

Concatenate Files (Python API)

from loppers import concatenate_files

# Combine all text files with skeleton extraction
result = concatenate_files(
    ["src/", "tests/"],
    recursive=True,
    extract_skeletons=True,
    verbose=True,
)
print(result)

Command Line Usage

# Extract single file
loppers myfile.py

# Process directory recursively
loppers -r src/ -o skeletons.txt

# Include original files (no extraction)
loppers -r src/ --no-extract

# Show progress
loppers -v -r .

Common CLI Examples:

# Multiple files
loppers file1.py file2.js file3.java

# Directory traversal
loppers src/                  # Non-recursive
loppers -r src/              # Recursive

# Mix files and directories
loppers -r src/ tests/ docs/

# Save to file
loppers -r . -o combined.txt

# Verbose output
loppers -v -r src/

Examples: Before and After

Python Example

Before:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""
        self.name = name
        self._setup()

    def process(self, data):
        """Process data."""
        result = []
        for item in data:
            result.append(item * 2)
        return result

After:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""

    def process(self, data):
        """Process data."""

JavaScript/TypeScript Example

Before:

class UserService {
    constructor(baseUrl: string) {
        this.baseUrl = baseUrl;
        this.cache = {};
    }

    async getUser(id: string) {
        if (this.cache[id]) return this.cache[id];
        const user = await fetch(this.baseUrl + '/' + id);
        return user.json();
    }
}

After:

class UserService {
    constructor(baseUrl: string) {
    }

    async getUser(id: string) {
    }
}

Java Example

Before:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
        this.baseUrl = baseUrl;
        this.validate();
    }

    public User getUserById(String id) {
        Database db = new Database();
        return db.query(id);
    }

    private void validate() {
        if (baseUrl == null) {
            throw new IllegalArgumentException("BaseUrl required");
        }
    }
}

After:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
    }

    public User getUserById(String id) {
    }

    private void validate() {
    }
}

Supported Languages

Language Features
Python Functions, methods, __init__, @property, docstrings
JavaScript/TypeScript Functions, arrow functions, methods, async/await
Java Methods, constructors, static methods, annotations
Kotlin Functions, methods, properties (getters/setters)
Go Functions, methods, closures
Rust Functions, methods, closures
C/C++ Functions, methods, constructors
C# Methods, properties (get/set), async/await
Ruby Methods, singleton methods, blocks
PHP Functions, methods, closures
Swift Functions, methods, closures
Lua Functions, local functions
Scala Functions, methods, closures
Groovy Functions, methods, closures
Objective-C Methods, instance/class methods

What Gets Preserved

  • ✅ Function/method signatures
  • ✅ Parameter types and defaults
  • ✅ Return types
  • ✅ Class definitions
  • ✅ Import statements
  • ✅ Comments
  • ✅ Python docstrings
  • ✅ Decorators
  • ✅ Access modifiers (public, private, protected)

What Gets Removed

  • ❌ Function/method bodies
  • ❌ Local variable assignments
  • ❌ Logic and implementation details
  • ❌ Nested function implementations

Known Limitations

  • Concise arrow functions (const f = x => x * 2) - no body to remove
  • Python lambdas - no body to remove
  • Some edge cases with getters/setters in JavaScript/TypeScript

API Reference

Skeleton Extraction

extract(source_code: str, language: str) -> str

Extract skeleton from source code.

Example:

from loppers import extract

code = "def hello(): print('hi')"
skeleton = extract(code, "python")  # Returns: "def hello():"

SkeletonExtractor(language: str)

Create a language-specific extractor for reuse.

Example:

from loppers import SkeletonExtractor

extractor = SkeletonExtractor("python")
skeleton1 = extractor.extract(code1)
skeleton2 = extractor.extract(code2)

File Operations

concatenate_files(file_paths, recursive=False, verbose=False, extract_skeletons=True) -> str

Concatenate files with optional skeleton extraction.

Args:

  • file_paths - List of file and/or directory paths
  • recursive - Recursively traverse directories (default: False)
  • verbose - Print progress to stderr (default: False)
  • extract_skeletons - Extract code skeletons (default: True)

Returns: Concatenated content with --- headers separating files

Example:

from loppers import concatenate_files

result = concatenate_files(
    ["src/", "tests/"],
    recursive=True,
    extract_skeletons=True,
)
# Automatically skips binary files
# Extracts skeletons for supported languages
# Includes original content for unsupported types

collect_files(paths, recursive=False, verbose=False, include_all_text_files=True) -> list[Path]

Collect text files from paths, excluding binary files.

Returns: Sorted list of file paths

is_binary_file(file_path: Path) -> bool

Detect if a file is binary.

Uses the binaryornot library which checks:

  • Known binary file extensions
  • Null bytes in content
  • UTF-8 decoding success

Example:

from loppers import is_binary_file
from pathlib import Path

if not is_binary_file(Path("image.jpg")):
    process_as_text()

Utility Functions

get_language(extension: str) -> str | None

Get language identifier from file extension.

Example:

from loppers import get_language

get_language(".py")   # Returns: "python"
get_language(".java") # Returns: "java"
get_language(".txt")  # Returns: None

How It Works

Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:

  • Function/method signatures
  • Class and interface definitions
  • Import statements
  • Python docstrings
  • Comments
  • Decorators
  • Type hints

Language-Specific Queries:

Each language has a custom tree-sitter query pattern:

# Python: Remove function body but keep docstrings
"(function_definition body: (block) @body)"

# JavaScript: Handle multiple function types
"[(function_declaration body: ...) (arrow_function body: ...) ...]"

# Kotlin: Extract functions and property getters/setters
"[(function_declaration (function_body) @body) (getter ...) (setter ...)]"

Development

Setup

# Install with dev dependencies
uv sync --extra dev

Running Tests

# All tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage
uv run pytest --cov=loppers --cov-report=html

# Specific test
uv run pytest tests/test_loppers.py::TestSkeletonExtractor::test_python_extraction

Code Quality

# Check and fix
uv run ruff check . --fix

# Format
uv run ruff format .

# All checks at once
uv run ruff check . --fix && uv run ruff format .

Publishing to PyPI

This project uses Python Semantic Release with conventional commits.

Commit message format:

  • feat: - New feature (bumps minor version)
  • fix: - Bug fix (bumps patch version)
  • BREAKING CHANGE: - Major version bump
  • docs:, chore:, refactor: - No version bump

Automated release:

# Push to main with conventional commit messages
# GitHub Actions will automatically:
# 1. Analyze commits
# 2. Bump version
# 3. Build distributions
# 4. Publish to PyPI

Manual publishing:

# Build locally
uv run python -m build

# View distributions
ls -lh dist/

Adding New Languages

To add support for a new language:

  1. Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies

  2. Add to LANGUAGE_CONFIGS in src/loppers/loppers.py:

    LANGUAGE_CONFIGS["mylang"] = LanguageConfig(
        name="mylang",
        body_query="(function_definition body: (block) @body)",
    )
    
  3. Add file extensions to src/loppers/mapping.py:

    EXTENSION_TO_LANGUAGE = {
        ".ml": "mylang",
        ".mli": "mylang",
    }
    
  4. Write a test in tests/test_loppers.py:

    def test_mylang_extraction(self):
        code = "fun hello() { print('hi') }"
        skeleton = extract(code, "mylang")
        self.assertIn("fun hello()", skeleton)
        self.assertNotIn("print", skeleton)
    
  5. Add example file in examples/sample.ml showcasing language features

Project Structure

loppers/
├── src/loppers/
│   ├── __init__.py              # Public API exports
│   ├── loppers.py               # Core extraction logic
│   ├── concatenator.py          # File concatenation
│   ├── mapping.py               # Language mapping
│   └── cli.py                   # Command-line interface
├── tests/
│   └── test_loppers.py          # Unit tests (24 tests)
├── examples/
│   ├── sample.py                # Python examples
│   ├── sample.kt                # Kotlin examples
│   └── ...                       # Other language samples
├── pyproject.toml               # Project configuration
├── README.md                    # Main documentation (this file)
├── CHANGELOG.md                 # Release history
└── CLAUDE.md                    # Claude Code development guide

Dependencies

Runtime:

  • tree-sitter>=0.25.0 - AST parsing library
  • tree-sitter-language-pack>=0.10.0 - Language grammars
  • binaryornot>=0.4.4 - Binary file detection

Development:

  • pytest>=7.0.0 - Testing framework
  • ruff>=0.1.0 - Linting and formatting
  • python-semantic-release>=8.0.0 - Release automation

References

License

MIT - See LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loppers-1.3.2.tar.gz (95.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loppers-1.3.2-py3-none-any.whl (12.6 kB view details)

Uploaded Python 3

File details

Details for the file loppers-1.3.2.tar.gz.

File metadata

  • Download URL: loppers-1.3.2.tar.gz
  • Upload date:
  • Size: 95.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-1.3.2.tar.gz
Algorithm Hash digest
SHA256 334921e127cef6da5d20208d2c491f68beaeb2338afe3c19acb08eba9f69b897
MD5 72847bf4036700921ed31831bd6b8469
BLAKE2b-256 4f34533ba14eed351e1a1928c8564dc6a0fae5c83b10e1eea046aadabce5775c

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-1.3.2.tar.gz:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file loppers-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: loppers-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 12.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5e820560888948e1fa48396c4c5af69bd9009cae9b47312305a884f406aaa724
MD5 143b59fdabdca18c508cb2a0f38a5929
BLAKE2b-256 fc5969f67f994d1a4e54ff30b1d7384f3d948b56415faca3de38508d201df72c

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-1.3.2-py3-none-any.whl:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page