Skip to main content

A tool for preparing your codebase for use with LLMs

Project description

LMPrep

License: MIT PyPI version PyPI - Python Version Rust Platform Support

A lightning-fast utility for preparing and organizing your code for use with LLMs like Claude Projects. LMPrep will collect and rename all of your project files to a flat directory, but preserving the structure within the filenames.

For example, a file at src/models/user.py will be renamed to src^models^user.py in the output directory.

Features

  • Smart File Organization: Automatically flattens complex directory structures while preserving path information in the filenames and in a file tree
  • Configurable Filtering: Specify which file extensions to include in your dataset to limit context size
  • Path Preservation: Uses customizable delimiters to maintain original path information in filenames
  • Git-Aware: Respects .gitignore patterns to exclude unwanted files or secrets
  • Flexible Output: Generate individual files or create a zip archive
  • Visual Tree View: Visualize your source and output file structure, or send the file tree to the LLM
  • Fast & Efficient: Written in Rust for maximum performance

Quick Start

Installation

  1. Download the latest release for your platform from Releases:

    • Windows: lm-x86_64-pc-windows-msvc.zip
    • Linux: lm-x86_64-unknown-linux-gnu.tar.gz
    • macOS: lm-x86_64-apple-darwin.tar.gz
  2. Install the binary:

Linux/macOS:

# Extract and copy binary
tar xzf lm-x86_64-*-*.tar.gz
sudo mv lm /usr/local/bin/

# Create config file
curl -O https://raw.githubusercontent.com/bcherb2/lmprep/main/src/config-example.yaml
mv config-example.yaml ~/.lmprep.yml

Windows (in PowerShell, run as Administrator):

# Extract and copy binary
Expand-Archive lm-x86_64-pc-windows-msvc.zip
New-Item -ItemType Directory -Force -Path "C:\Program Files\lmprep"
Move-Item -Force lm.exe "C:\Program Files\lmprep\lm.exe"
$env:Path += ";C:\Program Files\lmprep"

# Create config file
Invoke-WebRequest -Uri "https://raw.githubusercontent.com/bcherb2/lmprep/main/src/config-example.yaml" -OutFile "$env:USERPROFILE\.lmprep.yml"
  1. Verify installation:
lm --help

Alternative: Build from Source

If you have Rust installed, you can build from source:

git clone https://github.com/bcherb2/lmprep
cd lmprep
cargo build --release

The binary will be in target/release/lm (or lm.exe on Windows). Follow step 2 above to set up the config file.

Basic Usage

# Organize files in current directory
lm .

# Organize files from a specific directory
lm /path/to/source

# Use a custom config file
lm . -c /path/to/.lmprep.yml

# Create a zip archive instead of of individual files
lm . --zip

Configuration

Create a .lmprep.yml file in your home directory to customize behavior, or create one in your project root directory. Here's an example:

allowed_extensions:
  - py
  - rs
  - md
  - txt
delimiter: "^"
subfolder: context
zip: false
tree: true
respect_gitignore: true

NOTE: The install script will create a default config file at ~/.lmprep.yml

Configuration Options

Option Description Default
allowed_extensions File extensions to include [] (all files)
delimiter Character used to represent path hierarchy ^
subfolder Output directory name within project context
zip Create zip archive instead of files false
tree Show file tree visualization true
respect_gitignore Honor .gitignore patterns true

Command Line Options

lm [OPTIONS] [SOURCE]

Arguments:
  [SOURCE]  Source directory to organize files from [default: .]

Options:
  -c, --config <FILE>     Path to config file
  -s, --subfolder <NAME>  Override the subfolder name from config
  -z, --zip              Create a zip file instead of individual files
  -t, --tree             Show file tree of source and output
  -v, --verbose          Enable verbose logging
  -h, --help             Print help
  -V, --version          Print version

Use Cases

  • ML Dataset Preparation: Organize your training data into a flat structure while preserving context (works especially well with Claude Projects)
  • Code Analysis: Prepare source code for analysis by LLMs
  • Document Processing: Organize and prepare document collections for processing, logs, etc.
  • Version Control: Easily create clean snapshots of your codebase for archival in zip format

Building from Source

  1. Install Rust using rustup
  2. Clone the repository:
    git clone https://github.com/bcherb2/lmprep.git
    cd lmprep
    
  3. Build the project:
    cargo build --release
    
  4. The binary will be available at target/release/lm, copy it and add it to your PATH
  5. Create the .lmprep.yml file in your home directory or project root

NOTE: see install/BUILD.md for more in depth building instructions.

FAQ

Q: Why use LMPrep instead of just copying files? A: LMPrep preserves directory structure information in filenames, making it easier for LLMs to understand file relationships and context. Sure, you can do this manually, but it gets tedious.

Q: How does path flattening work? A: A file at src/models/user.py becomes src^models^user.py in the output directory (using default delimiter). Changing the delimiter to + would result in src+models+user.py.

Q: Can I exclude certain files or directories? A: Yes! LMPrep respects .gitignore patterns and allows you to specify allowed file extensions.

Q: Is it safe to use on large directories? A: Yes! LMPrep is written in Rust for performance and memory efficiency, making it suitable for large datasets.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

lmprep-0.3.0-py3-none-win_amd64.whl (6.7 MB view details)

Uploaded Python 3Windows x86-64

lmprep-0.3.0-py3-none-manylinux2014_x86_64.whl (6.7 MB view details)

Uploaded Python 3

lmprep-0.3.0-py3-none-macosx_10_9_universal2.whl (6.7 MB view details)

Uploaded Python 3macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file lmprep-0.3.0-py3-none-win_amd64.whl.

File metadata

  • Download URL: lmprep-0.3.0-py3-none-win_amd64.whl
  • Upload date:
  • Size: 6.7 MB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.7

File hashes

Hashes for lmprep-0.3.0-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 69d15f3753872c1682a6a9cb604b7d78973d07943163337dcc78b3d1b4cb5b57
MD5 c70b1aad3e943108d37857c3bb014fe9
BLAKE2b-256 9c22e23351c9e02384528a58fbc991b6f50da98b88512f7f09990d281a2aeba6

See more details on using hashes here.

File details

Details for the file lmprep-0.3.0-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for lmprep-0.3.0-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a0f6c3e11208aaf63465c848fc3d377d5197b394188e073fcb2cce882243207a
MD5 9eaa4a1fb095527f0a958d9cb80ce90c
BLAKE2b-256 032ba46cef20f3b5594b8dbbd7c7fa07787b1112bbe7e7625280e85f8d0cc484

See more details on using hashes here.

File details

Details for the file lmprep-0.3.0-py3-none-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for lmprep-0.3.0-py3-none-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 b2aca1c2072c86efcf1c3485355e201bceb6fcbd98fb91ee873bd01dccd6512a
MD5 8b44e670b7d203db9f16e26945af339f
BLAKE2b-256 522ff5eef4cfb66fd7588d2ff2881e58e2dcd10cf9daf0baf24ab6fcb6c06d5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page