Skip to main content

concatenating files for tossing them into a language model

Project description

lmcat

A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.

Features

  • Tree view of directory structure with file statistics (lines, characters, tokens)
  • Includes file contents with clear delimiters
  • Respects .gitignore patterns (can be disabled)
  • Supports custom ignore patterns via .lmignore
  • Configurable via pyproject.toml, lmcat.toml, or lmcat.json
    • you can specify glob_process or decider_process to run on files, like if you want to convert a notebook to a markdown file

Installation

Install from PyPI:

pip install lmcat

or, install with support for counting tokens:

pip install lmcat[tokenizers]

Usage

Basic usage - concatenate current directory:

# Only show directory tree
python -m lmcat --tree-only

# Write output to file
python -m lmcat --output summary.md

# Print current configuration
python -m lmcat --print-cfg

The output will include a directory tree and the contents of each non-ignored file.

Command Line Options

  • -t, --tree-only: Only print the directory tree, not file contents
  • -o, --output: Specify an output file (defaults to stdout)
  • -h, --help: Show help message

Configuration

lmcat is best configured via a tool.lmcat section in pyproject.toml:

[tool.lmcat]
# Tree formatting
tree_divider = "│   "    # Vertical lines in tree
tree_indent = " "        # Indentation
tree_file_divider = "├── "  # File/directory entries
content_divider = "``````"  # File content delimiters

# Processing pipeline
tokenizer = "gpt2"  # or "whitespace-split"
tree_only = false   # Only show tree structure
on_multiple_processors = "except"  # Behavior when multiple processors match

# File handling
ignore_patterns = ["*.tmp", "*.log"]  # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]

# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"

Development

Setup

  1. Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
  1. Set up the development environment:
make setup

Development Commands

The project uses make for common development tasks:

  • make dep: Install/update dependencies
  • make format: Format code using ruff and pycln
  • make test: Run tests
  • make typing: Run type checks
  • make check: Run all checks (format, test, typing)
  • make clean: Clean temporary files
  • make docs: Generate documentation
  • make build: Build the package
  • make publish: Publish to PyPI (maintainers only)

Run make help to see all available commands.

Running Tests

make test

For verbose output:

VERBOSE=1 make test

Roadmap

  • more processors and deciders, like:
    • only first n lines if file is too large
    • first few lines of a csv file
    • json schema of a big json/toml/yaml file
    • metadata extraction from images
  • better tests, I feel like gitignore/lmignore interaction is broken
  • llm summarization and caching of those summaries in .lmsummary/
  • reasonable defaults for file extensions to ignore
  • web interface

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmcat-0.1.1.tar.gz (298.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lmcat-0.1.1-py3-none-any.whl (25.9 kB view details)

Uploaded Python 3

File details

Details for the file lmcat-0.1.1.tar.gz.

File metadata

  • Download URL: lmcat-0.1.1.tar.gz
  • Upload date:
  • Size: 298.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f8063145fcd55e58aafbc7b93d8e2043097a3e2b8f5ef9f9c12246c7f0ffeb57
MD5 6d2b0d447c41e63a0c86880be954a6aa
BLAKE2b-256 568a9ec345dd370cd55b16a810137789fa5b0d96dfd647d2109724445017207b

See more details on using hashes here.

File details

Details for the file lmcat-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: lmcat-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for lmcat-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed62e52dde8595649e4e4e6408062a73e0adcd093167fa3ead888d893dd4ee34
MD5 e0b13eff9afe353b4a7db1c55f0a75b4
BLAKE2b-256 3f212cb4b3b475da396174469cf838d8bb1c8ff509d8cdfb41f5b4fd2102b82d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page