Skip to main content

concatenating files for tossing them into a language model

Project description

PyPI PyPI - Downloads docs Checks Coverage

GitHub commits GitHub commit activity code size, bytes

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

  • Tree view of directory structure with file statistics (lines, characters, tokens)
  • Includes file contents with clear delimiters
  • Respects .gitignore patterns (can be disabled)
  • Supports custom ignore patterns via .lmignore
  • Configurable via pyproject.toml, lmcat.toml, or lmcat.json
    • you can specify glob_process or decider_process to run on files, like if you want to convert a notebook to a markdown file

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

  • -t, --tree-only: Only print the directory tree, not file contents
  • -o, --output: Specify an output file (defaults to stdout)
  • -h, --help: Show help message

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

  • make dep: Install/update dependencies
  • make format: Format code using ruff and pycln
  • make test: Run tests
  • make typing: Run type checks
  • make check: Run all checks (format, test, typing)
  • make clean: Clean temporary files
  • make docs: Generate documentation
  • make build: Build the package
  • make publish: Publish to PyPI (maintainers only)

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

  • more processors and deciders, like:
    • only first n lines if file is too large
    • first few lines of a csv file
    • json schema of a big json/toml/yaml file
    • metadata extraction from images
  • better tests, I feel like gitignore/lmignore interaction is broken
  • llm summarization and caching of those summaries in .lmsummary/
  • reasonable defaults for file extensions to ignore
  • web interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcat-0.2.0.tar.gz (304.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmcat-0.2.0-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file lmcat-0.2.0.tar.gz.

File metadata

  • Download URL: lmcat-0.2.0.tar.gz
  • Upload date:
  • Size: 304.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f83aaece9554620a04c59dfc8e9af0a022f59bc07a64d1ad4d8640bb3dca6599
MD5 db66f713788058ee1fd547ff68921f4b
BLAKE2b-256 8c899b1460d9a0cabb41d525d19a4d8f7f17a4354e365b7876d991f0e7fa90f3

See more details on using hashes here.

File details

Details for the file lmcat-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lmcat-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 776bd5ca877cc2e1bab63fb5c1d5249989b257e9991d2e579e4aac182cf80b57
MD5 2547139758139012fe19c0b8ad08c282
BLAKE2b-256 8bbc376fed86f94fa0f59e31d1a7f11eab1b4873a236199b4d1c598c1bcb50a5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page