Skip to main content

Extract source file skeletons using tree-sitter queries

Project description

Loppers

Extract source file skeletons using tree-sitter queries.

Removes function implementations while preserving structure, signatures, and docstrings. Supports 17 programming languages with a clean, fully-typed Python API and comprehensive CLI.

Requires: tree-sitter >= 0.25

Features

  • 17 Languages - Python, JS/TS, Java, Kotlin, Go, Rust, C/C++, C#, Ruby, PHP, Swift, Lua, Scala, Groovy, Objective-C
  • Smart Extraction - Functions, methods, constructors, arrow functions, getters/setters
  • Preserved Elements - Signatures, class definitions, imports, docstrings, decorators
  • All File Types - Process any non-binary text files (code, markdown, JSON, YAML, etc.)
  • Binary Detection - Automatically skips binary files
  • Ignore Patterns - Built-in + custom .gitignore support
  • Fully Typed - Complete type hints throughout
  • CLI & Library - Use as command-line tool or Python library

Quick Start

Installation

# With uv (recommended)
uv pip install loppers

# With pip
pip install loppers

Python API

The public API consists of 5 core functions:

1. extract_skeleton(source: str, language: str) -> str

Extract skeleton from source code by language identifier.

from loppers import extract_skeleton

code = """
def calculate(x: int, y: int) -> int:
    '''Calculate sum.'''
    result = x + y
    return result
"""

skeleton = extract_skeleton(code, "python")
print(skeleton)

Output:

def calculate(x: int, y: int) -> int:
    '''Calculate sum.'''

2. get_skeleton(file_path: Path | str, *, add_header: bool = False) -> str

Extract skeleton from a file by auto-detecting language from extension.

from loppers import get_skeleton

skeleton = get_skeleton("src/main.py")
print(skeleton)

# With header showing file path
skeleton = get_skeleton("src/main.py", add_header=True)
# Output: "--- /path/to/src/main.py\n..."

Raises:

  • FileNotFoundError - If file doesn't exist
  • ValueError - If file language is unsupported

3. find_files(root: str | Path, *, recursive: bool = True, ignore_patterns: Sequence[str] | None = None, use_default_ignore: bool = True, respect_gitignore: bool = True) -> list[str]

Collect all non-binary text files from a root directory.

from loppers import find_files

# Find all text files in src/ recursively (default)
files = find_files("src/")

# Returns file paths relative to root:
# ['main.py', 'utils.py', 'config.yaml', 'README.md']

# Non-recursive
files = find_files("src/", recursive=False)

# Custom ignore patterns (gitignore syntax)
files = find_files(
    "src/",
    ignore_patterns=["*.test.py", "venv/"],
    use_default_ignore=True,  # Still applies built-in patterns
    respect_gitignore=True,   # Still respects .gitignore
)

Features:

  • Takes single root directory (not multiple paths)
  • Returns file paths relative to root
  • Automatically excludes binary files (images, archives, etc.)
  • Respects .gitignore by default
  • Supports custom gitignore-style ignore patterns
  • Built-in patterns exclude node_modules, .git, pycache, build artifacts, etc.
  • Works with ALL non-binary text files (code, markdown, JSON, YAML, etc.)

4. get_tree(root: str | Path, *, recursive: bool = True, ignore_patterns: Sequence[str] | None = None, use_default_ignore: bool = True, respect_gitignore: bool = True, collapse_single_dirs: bool = False, show_sizes: bool = False) -> str

Display formatted directory tree from a root directory.

from loppers import get_tree

# Display tree of src/ directory recursively
tree = get_tree("src/")
print(tree)

# Non-recursive tree
tree = get_tree("src/", recursive=False)

# With custom ignore patterns
tree = get_tree("src/", ignore_patterns=["*.test.py"])

# Collapse deep single-child directories (useful for Java packages)
tree = get_tree("src/", collapse_single_dirs=True)

# Show file sizes in human-friendly format
tree = get_tree("src/", show_sizes=True)

# Combine multiple options
tree = get_tree("src/", collapse_single_dirs=True, show_sizes=True)

Output (with collapse_single_dirs=True and show_sizes=True):

.
├─ main/java/com/example
│  ├─ Source.java  (2.3KB)
│  └─ Util.java  (1.8KB)
└─ resources/config.yaml  (512B)

5. concatenate_files(root: str | Path, file_paths: Sequence[str | Path], *, extract: bool = True, ignore_not_found: bool = False) -> str

Concatenate files with optional skeleton extraction. Useful when you already have a list of file paths that you want to combine.

from loppers import concatenate_files, find_files

# Get list of files to concatenate
root = "src/"
files = find_files(root)

# Concatenate with skeleton extraction (default)
result = concatenate_files(root, files)
print(result)

# Concatenate without extraction (include original content)
result = concatenate_files(root, files, extract=False)

# Ignore files that don't exist or can't be processed
result = concatenate_files(root, files, ignore_not_found=True)

Output format:

--- path/to/file.py
def calculate(x: int) -> int:
    '''Calculate.'''

--- path/to/other.js
function process() {
}

Features:

  • Files are concatenated with headers showing their relative paths
  • Each file separated by newlines
  • Automatically extracts skeletons from code files (unless extract=False)
  • Falls back to original content for unsupported file types
  • Can optionally ignore files that don't exist or can't be processed

Parameters:

  • root - Root directory (file paths are relative to this)
  • file_paths - List of file paths (relative to root) to concatenate
  • extract - Extract skeletons from code files (default True)
  • ignore_not_found - Ignore files that cannot be found or processed (default False)

Raises:

  • FileNotFoundError - If root doesn't exist or a file is not found (when ignore_not_found=False)
  • ValueError - If no file paths provided or no files could be processed
  • NotADirectoryError - If root is not a directory

Utility Function

get_language(extension: str) -> str | None - Get language identifier from file extension.

from loppers import get_language

get_language(".py")    # "python"
get_language(".js")    # "javascript"
get_language(".json")  # None (no extraction for data files)

Command-Line Interface

Loppers provides 4 subcommands for common tasks.

Basic Usage

loppers --version
loppers --help

1. extract - Extract skeleton from file or stdin

Extract a single file's skeleton:

# From file
loppers extract file.py
loppers extract file.py -o skeleton.py

# From stdin with explicit language
echo 'def foo(): pass' | loppers extract -l python

# Verbose output
loppers extract file.py -v

Options:

  • FILE - File to extract (omit for stdin)
  • -l, --language - Language identifier (auto-detected from extension if FILE provided, required for stdin)
  • -o, --output - Output file (default: stdout)
  • -v, --verbose - Print status to stderr

2. concatenate - Concatenate files with optional skeleton extraction

Process root directory with automatic skeleton extraction:

# Recursive (default)
loppers concatenate src/

# Non-recursive
loppers concatenate --no-recursive src/

# Save to file
loppers concatenate src/ -o combined.txt

# Verbose with progress
loppers concatenate -v src/

# Include original files without extraction
loppers concatenate --no-extract src/

# Custom ignore patterns
loppers concatenate -I "*.test.py" -I "venv/" src/

# Disable default ignores
loppers concatenate --no-default-ignore src/

# Don't respect .gitignore
loppers concatenate --no-gitignore src/

Features:

  • Processes a single root directory (paths relative to root)
  • Includes ALL non-binary text files (code, markdown, JSON, YAML, etc.)
  • Automatically extracts skeletons for supported code files
  • Includes original content for unsupported file types (graceful degradation)
  • Each file prefixed with --- filepath header (relative path)
  • Verbose mode shows extraction status for each file

Options:

  • root - Root directory to process (required)
  • -o, --output - Output file (default: stdout)
  • --no-extract - Include original files without extraction
  • -I, --ignore-pattern - Add custom ignore pattern (gitignore syntax, can be used multiple times)
  • --no-default-ignore - Disable built-in ignore patterns
  • --no-gitignore - Don't respect .gitignore
  • --no-recursive - Don't recursively traverse directories
  • -v, --verbose - Print status to stderr

3. tree - Show directory tree of discovered files

Display a formatted tree of all discovered files:

# Recursive tree (default)
loppers tree src/

# Non-recursive
loppers tree --no-recursive src/

# Save tree to file
loppers tree src/ -o tree.txt

# With ignore patterns
loppers tree -I "*.test.py" src/

# Collapse deep single-child directories (useful for Java packages)
loppers tree --collapse-single-dirs src/

# Show file sizes in human-friendly format
loppers tree --show-sizes src/

# Combine multiple options
loppers tree --collapse-single-dirs --show-sizes src/

Options:

  • root - Root directory to process (required)
  • -o, --output - Output file (default: stdout)
  • -I, --ignore-pattern - Add custom ignore pattern
  • --no-default-ignore - Disable built-in ignore patterns
  • --no-gitignore - Don't respect .gitignore
  • --no-recursive - Non-recursive tree
  • --collapse-single-dirs - Collapse directories with single children (e.g., java/com/example becomes one line)
  • --show-sizes - Show file sizes in human-friendly format (e.g., "1.2KB", "5.0MB")
  • -v, --verbose - Print status to stderr

Collapse Example:

Without collapse:

.
└─ src
   └─ main
      └─ java
         └─ com
            └─ example
               ├─ Source.java
               └─ Util.java

With --collapse-single-dirs:

.
└─ src/main/java/com/example
   ├─ Source.java
   └─ Util.java

4. files - List all discovered files

Print one discovered file per line (relative to root):

# List all files recursively (default)
loppers files src/

# Save list to file
loppers files src/ -o file_list.txt

# Non-recursive
loppers files --no-recursive src/

# With custom ignores
loppers files -I "*.md" src/

Options:

  • root - Root directory to process (required)
  • -o, --output - Output file (default: stdout)
  • -I, --ignore-pattern - Add custom ignore pattern
  • --no-default-ignore - Disable built-in ignore patterns
  • --no-gitignore - Don't respect .gitignore
  • --no-recursive - Non-recursive listing
  • -v, --verbose - Print status to stderr

Examples: Before and After

Python Example

Before:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""
        self.name = name
        self._setup()

    def process(self, data):
        """Process data."""
        result = []
        for item in data:
            result.append(item * 2)
        return result

After:

class Calculator:
    def __init__(self, name: str):
        """Initialize calculator."""

    def process(self, data):
        """Process data."""

JavaScript/TypeScript Example

Before:

class UserService {
    constructor(baseUrl: string) {
        this.baseUrl = baseUrl;
        this.cache = {};
    }

    async getUser(id: string) {
        if (this.cache[id]) return this.cache[id];
        const user = await fetch(this.baseUrl + '/' + id);
        return user.json();
    }
}

After:

class UserService {
    constructor(baseUrl: string) {
    }

    async getUser(id: string) {
    }
}

Java Example

Before:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
        this.baseUrl = baseUrl;
        this.validate();
    }

    public User getUserById(String id) {
        Database db = new Database();
        return db.query(id);
    }

    private void validate() {
        if (baseUrl == null) {
            throw new IllegalArgumentException("BaseUrl required");
        }
    }
}

After:

public class UserService {
    private String baseUrl;

    public UserService(String baseUrl) {
    }

    public User getUserById(String id) {
    }

    private void validate() {
    }
}

Supported Languages

Language Features
Python Functions, methods, __init__, @property, docstrings
JavaScript/TypeScript Functions, arrow functions, methods, async/await
Java Methods, constructors, static methods, annotations
Kotlin Functions, methods, properties (getters/setters)
Go Functions, methods, closures
Rust Functions, methods, closures
C/C++ Functions, methods, constructors
C# Methods, properties (get/set), async/await
Ruby Methods, singleton methods, blocks
PHP Functions, methods, closures
Swift Functions, methods, closures
Lua Functions, local functions
Scala Functions, methods, closures
Groovy Functions, methods, closures
Objective-C Methods, instance/class methods

What Gets Preserved

  • ✅ Function/method signatures
  • ✅ Parameter types and defaults
  • ✅ Return types
  • ✅ Class definitions
  • ✅ Import statements
  • ✅ Comments
  • ✅ Python docstrings
  • ✅ Decorators
  • ✅ Access modifiers (public, private, protected)

What Gets Removed

  • ❌ Function/method bodies
  • ❌ Local variable assignments
  • ❌ Logic and implementation details
  • ❌ Nested function implementations

Known Limitations

  • Concise arrow functions (const f = x => x * 2) - no body to remove
  • Python lambdas - no body to remove
  • Some edge cases with getters/setters in JavaScript/TypeScript

How It Works

Loppers uses tree-sitter queries to parse source code into Abstract Syntax Trees (AST) and intelligently remove function/method bodies while preserving:

  • Function/method signatures
  • Class and interface definitions
  • Import statements
  • Python docstrings
  • Comments
  • Decorators
  • Type hints

Each language has custom tree-sitter query patterns that capture function/method body nodes, which are then removed line-by-line.

Development

Setup

# Install with dev dependencies
uv sync

Running Tests

# All tests
uv run pytest

# Verbose output
uv run pytest -v

# With coverage
uv run pytest --cov=loppers --cov-report=html

# Specific test
uv run pytest tests/test_loppers.py::test_python_extraction

Code Quality

# Check and fix
uv run ruff check . --fix

# Format
uv run ruff format .

# All checks
uv run ruff check . --fix && uv run ruff format .

Adding New Languages

To add support for a new language:

  1. Find the tree-sitter query - Use the tree-sitter playground to develop a query that captures function bodies

  2. Add to LANGUAGE_CONFIGS in src/loppers/loppers.py:

    LANGUAGE_CONFIGS["mylang"] = LanguageConfig(
        name="mylang",
        body_query="(function_definition body: (block) @body)",
    )
    
  3. Add file extensions to src/loppers/extensions.py:

    EXTENSION_TO_LANGUAGE = {
        ".ml": "mylang",
        ".mli": "mylang",
    }
    
  4. Write a test in tests/test_loppers.py:

    def test_mylang_extraction(self):
        code = "fun hello() { print('hi') }"
        skeleton = extract_skeleton(code, "mylang")
        assert "fun hello()" in skeleton
        assert "print" not in skeleton
    
  5. Run tests to verify everything works

Project Structure

loppers/
├── src/loppers/
│   ├── __init__.py              # Public API: extract_skeleton, get_skeleton, find_files, get_tree, concatenate_files
│   ├── loppers.py               # Core extraction logic with SkeletonExtractor class
│   ├── source_utils.py          # Convenience API and file operations
│   ├── extensions.py            # Language extension mapping
│   ├── ignore_patterns.py       # Default ignore patterns
│   ├── mapping.py               # Backwards compatibility re-exports
│   └── cli.py                   # Command-line interface (4 subcommands)
├── tests/
│   └── test_loppers.py          # Unit tests (38 tests)
├── pyproject.toml               # Project configuration
├── README.md                    # This file
└── CLAUDE.md                    # Development guide for Claude Code

Dependencies

Runtime:

  • tree-sitter>=0.25.0 - AST parsing library
  • tree-sitter-language-pack>=0.10.0 - Language grammars
  • binaryornot>=0.4.4 - Binary file detection
  • pathspec>=0.9.0 - .gitignore pattern matching

Development:

  • pytest>=7.0.0 - Testing framework
  • pytest-cov>=4.0.0 - Coverage reporting
  • ruff>=0.1.0 - Linting and formatting

References

License

MIT - See LICENSE file for details

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loppers-2.4.0.tar.gz (105.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loppers-2.4.0-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file loppers-2.4.0.tar.gz.

File metadata

  • Download URL: loppers-2.4.0.tar.gz
  • Upload date:
  • Size: 105.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-2.4.0.tar.gz
Algorithm Hash digest
SHA256 5f26e96e213c541ffbb024e5279713c7d4b45e118adc2e092eaac4073d35bf10
MD5 590635ba7536cde802947da1762de88e
BLAKE2b-256 e7d800e8d6ecd442544a309f0fa16563b1cacb3c0d47605270eb6047c03be1be

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-2.4.0.tar.gz:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file loppers-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: loppers-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for loppers-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aceb7efba0994b8eaa9d28eb30011bccdf5e791f1b43395a24f1449f69456ea7
MD5 9148a061a5b20088dc690af4f2e947f1
BLAKE2b-256 9a2126e37450037ce4587387984ee641ca2ba940dc75c5aca5279f1b0b825964

See more details on using hashes here.

Provenance

The following attestation bundles were made for loppers-2.4.0-py3-none-any.whl:

Publisher: publish.yml on undo76/loppers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page