Skip to main content

concatenating files for tossing them into a language model

Project description

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

  • Tree view of directory structure with file statistics (lines, characters, tokens)
  • Includes file contents with clear delimiters
  • Respects .gitignore patterns (can be disabled)
  • Supports custom ignore patterns via .lmignore
  • Configurable via pyproject.toml, lmcat.toml, or lmcat.json
    • you can specify glob_process or decider_process to run on files, like if you want to convert a notebook to a markdown file

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

  • -t, --tree-only: Only print the directory tree, not file contents
  • -o, --output: Specify an output file (defaults to stdout)
  • -h, --help: Show help message

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

  • make dep: Install/update dependencies
  • make format: Format code using ruff and pycln
  • make test: Run tests
  • make typing: Run type checks
  • make check: Run all checks (format, test, typing)
  • make clean: Clean temporary files
  • make docs: Generate documentation
  • make build: Build the package
  • make publish: Publish to PyPI (maintainers only)

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

  • more processors and deciders, like:
    • only first n lines if file is too large
    • first few lines of a csv file
    • json schema of a big json/toml/yaml file
    • metadata extraction from images
  • better tests, I feel like gitignore/lmignore interaction is broken
  • llm summarization and caching of those summaries in .lmsummary/
  • reasonable defaults for file extensions to ignore
  • web interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcat-0.1.0.tar.gz (298.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmcat-0.1.0-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file lmcat-0.1.0.tar.gz.

File metadata

  • Download URL: lmcat-0.1.0.tar.gz
  • Upload date:
  • Size: 298.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.1.0.tar.gz
Algorithm Hash digest
SHA256 728d66cb4e43321332ddfd6baab20d6752b3882c68322c6d0a504b5d12eca1ac
MD5 5779ed613d14f06089649285972d7948
BLAKE2b-256 8a4147962d784bed7005895c5a78905ab206a645b6fadc7f392364a940f22543

See more details on using hashes here.

File details

Details for the file lmcat-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lmcat-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f15ac4fbe1dac23cebad3577e8454376801a7d96b7018d13db55af9b9f3936cb
MD5 673a738e06c9420ad5297c42cfffad95
BLAKE2b-256 022fa45fd4dc81aa3eba446c4b0867e0a3d998bb80b6bab7edde5e6401392753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page