concatenating files for tossing them into a language model
Project description
lmcat
A Python tool for concatenating files and directory structures into a single document, perfect for sharing code with language models. It respects .gitignore and .lmignore patterns and provides configurable output formatting.
Features
- Tree view of directory structure with file statistics (lines, characters, tokens)
- Includes file contents with clear delimiters
- Respects
.gitignorepatterns (can be disabled) - Supports custom ignore patterns via
.lmignore - Configurable via
pyproject.toml,lmcat.toml, orlmcat.json- you can specify
glob_processordecider_processto run on files, like if you want to convert a notebook to a markdown file
- you can specify
Installation
Install from PyPI:
pip install lmcat
or, install with support for counting tokens:
pip install lmcat[tokenizers]
Usage
Basic usage - concatenate current directory:
# Only show directory tree
python -m lmcat --tree-only
# Write output to file
python -m lmcat --output summary.md
# Print current configuration
python -m lmcat --print-cfg
The output will include a directory tree and the contents of each non-ignored file.
Command Line Options
-t,--tree-only: Only print the directory tree, not file contents-o,--output: Specify an output file (defaults to stdout)-h,--help: Show help message
Configuration
lmcat is best configured via a tool.lmcat section in pyproject.toml:
[tool.lmcat]
# Tree formatting
tree_divider = "│ " # Vertical lines in tree
tree_indent = " " # Indentation
tree_file_divider = "├── " # File/directory entries
content_divider = "``````" # File content delimiters
# Processing pipeline
tokenizer = "gpt2" # or "whitespace-split"
tree_only = false # Only show tree structure
on_multiple_processors = "except" # Behavior when multiple processors match
# File handling
ignore_patterns = ["*.tmp", "*.log"] # Additional patterns to ignore
ignore_patterns_files = [".gitignore", ".lmignore"]
# processors
[tool.lmcat.glob_process]
"[mM]akefile" = "makefile_recipes"
"*.ipynb" = "ipynb_to_md"
Development
Setup
- Clone the repository:
git clone https://github.com/mivanit/lmcat
cd lmcat
- Set up the development environment:
make setup
Development Commands
The project uses make for common development tasks:
make dep: Install/update dependenciesmake format: Format code using ruff and pyclnmake test: Run testsmake typing: Run type checksmake check: Run all checks (format, test, typing)make clean: Clean temporary filesmake docs: Generate documentationmake build: Build the packagemake publish: Publish to PyPI (maintainers only)
Run make help to see all available commands.
Running Tests
make test
For verbose output:
VERBOSE=1 make test
Roadmap
- more processors and deciders, like:
- only first
nlines if file is too large - first few lines of a csv file
- json schema of a big json/toml/yaml file
- metadata extraction from images
- only first
- better tests, I feel like gitignore/lmignore interaction is broken
- llm summarization and caching of those summaries in
.lmsummary/ - reasonable defaults for file extensions to ignore
- web interface
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lmcat-0.1.0.tar.gz.
File metadata
- Download URL: lmcat-0.1.0.tar.gz
- Upload date:
- Size: 298.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
728d66cb4e43321332ddfd6baab20d6752b3882c68322c6d0a504b5d12eca1ac
|
|
| MD5 |
5779ed613d14f06089649285972d7948
|
|
| BLAKE2b-256 |
8a4147962d784bed7005895c5a78905ab206a645b6fadc7f392364a940f22543
|
File details
Details for the file lmcat-0.1.0-py3-none-any.whl.
File metadata
- Download URL: lmcat-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f15ac4fbe1dac23cebad3577e8454376801a7d96b7018d13db55af9b9f3936cb
|
|
| MD5 |
673a738e06c9420ad5297c42cfffad95
|
|
| BLAKE2b-256 |
022fa45fd4dc81aa3eba446c4b0867e0a3d998bb80b6bab7edde5e6401392753
|