Enhanced tool to concatenate folder contents into markdown format for LLM consumption
Project description
folder2md4llms
Enhanced tool to concatenate folder contents into markdown format for LLM consumption, inspired by gpt-repository-loader with significant improvements.
โจ Features
- ๐ Markdown-first output - Professional formatting with table of contents, syntax highlighting, and structured sections
- ๐ Folder structure visualization - ASCII tree representation of directory structure
- ๐ Repository statistics - File counts, sizes, and language breakdown
- ๐ Document conversion - PDF, DOCX, XLSX files converted to text/markdown
- ๐ง Binary file analysis - Intelligent descriptions for images, archives, and executables
- โ๏ธ Highly configurable - YAML configuration files and comprehensive CLI options
- ๐ Fast and efficient - Multi-threaded processing with progress tracking
- ๐ Smart filtering - Advanced ignore patterns with glob support and template generation
- ๐ Multiple output formats - Markdown, HTML, and plain text support
- ๐ Cross-platform compatibility - Works seamlessly on Windows, macOS, and Linux
๐ Quick Start
Installation
# Install using uv (recommended)
uv add folder2md4llms
# Or using pip
pip install folder2md4llms
Basic Usage
# Process current directory
folder2md .
# Process specific directory with custom output
folder2md /path/to/repo --output analysis.md
# Skip tree generation and copy to clipboard
folder2md /path/to/repo --no-tree --clipboard
# Verbose mode with custom settings
folder2md /path/to/repo --verbose --max-file-size 2097152
# Generate ignore template file
folder2md --init-ignore
๐ Documentation
- API Documentation - Complete API reference
- Configuration Guide - Configuration options and examples
- File Type Support - Supported file formats
๐ ๏ธ Development
Setup
# Clone the repository
git clone https://github.com/AI-driven-Optical-Biology-Laboratory/folder2md4llms.git
cd folder2md4llms
# Create virtual environment and install dependencies
uv venv
uv sync --dev
# Install pre-commit hooks
make install-hooks
Development Commands
# Format code
make format
# Run linting
make lint
# Run tests
make test
# Run tests with coverage
make test-cov
# Run all checks
make check
# Run pre-commit on all files
make pre-commit
Testing
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_cli.py
# Run with coverage
uv run pytest --cov=folder2md4llms --cov-report=term-missing
๐ฏ Use Cases
- AI/ML Projects - Prepare codebases for LLM analysis and code review
- Documentation - Generate comprehensive project overviews
- Code Analysis - Create structured summaries for large repositories
- Knowledge Management - Convert project structures into searchable markdown
- Team Onboarding - Provide new team members with project overviews
๐ง Configuration
Basic Configuration
Create a folder2md.yaml file in your repository:
# Output settings
output_format: markdown
include_tree: true
include_stats: true
# Processing options
convert_docs: true
describe_binaries: true
max_file_size: 1048576 # 1MB
# Document conversion
pdf_max_pages: 50
xlsx_max_sheets: 10
Ignore Patterns
Quick Start with Template
Generate a comprehensive ignore template:
folder2md --init-ignore
This creates a .folder2md_ignore file with common patterns for:
- Version control systems (git, svn, etc.)
- Build artifacts and dependencies
- IDE and editor files
- OS-generated files
- Security-sensitive files
- Large media files
- Custom patterns section
Manual Creation
You can also create a .folder2md_ignore file manually:
# Version control
.git/
.svn/
# Build artifacts
__pycache__/
*.pyc
node_modules/
build/
dist/
# IDE files
.vscode/
.idea/
# Custom patterns
*.secret
temp/
๐ Output Format
The generated markdown includes:
- ๐ Table of Contents - Navigation links to all sections
- ๐ Folder Structure - ASCII tree representation
- ๐ Repository Statistics - File counts, sizes, and language breakdown
- ๐ Source Code - Syntax-highlighted code blocks
- ๐ Documents - Converted document content
- ๐ง Binary Files & Assets - Descriptions of non-text files
๐ Improvements over gptrepo
- Enhanced Output: Markdown formatting with table of contents and syntax highlighting
- Document Conversion: PDF, DOCX, XLSX files automatically converted
- Binary Analysis: Intelligent descriptions for images, archives, and executables
- Advanced Filtering: Glob patterns and hierarchical ignore rules with template generation
- Configuration: YAML configuration files and extensive CLI options
- Performance: Multi-threaded processing with progress tracking
- Cross-platform: Native support for Windows, macOS, and Linux
- Extensibility: Modular architecture for easy extension
๐ Cross-Platform Support
folder2md4llms works seamlessly across different operating systems:
- Windows: Full support with automatic dependency management
- macOS: Optimized for Apple Silicon and Intel processors
- Linux: Compatible with all major distributions
Platform-Specific Features
- File Type Detection: Automatic fallback when python-magic is unavailable
- Path Handling: Consistent behavior across different file systems
- Dependencies: Platform-specific package management (python-magic vs python-magic-bin)
- Error Handling: Robust handling of platform-specific file system quirks
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Inspired by gpt-repository-loader by mpoon
- Built with modern Python tooling: uv, ruff, pytest
๐ค Author
Ricardo Henriques - @ricardohenriques
Email: ricardo@henriqueslab.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folder2md4llms-0.2.0.tar.gz.
File metadata
- Download URL: folder2md4llms-0.2.0.tar.gz
- Upload date:
- Size: 138.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14b593b149c5efcd1129dd7397d386d8a1684e72bb8af9c059394e1f83243c2c
|
|
| MD5 |
092fbe9a60813656b8f5b40229edb138
|
|
| BLAKE2b-256 |
21166c5f760a3b2ca4767a0bf847aecf4a09cd9ec6e0061675e8b9c3936646d0
|
File details
Details for the file folder2md4llms-0.2.0-py3-none-any.whl.
File metadata
- Download URL: folder2md4llms-0.2.0-py3-none-any.whl
- Upload date:
- Size: 31.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2886069310b1629593e3f7086462813956e8e4af342aac6531d3dac9aceac657
|
|
| MD5 |
b2909acd14bfeef96acc196a77720972
|
|
| BLAKE2b-256 |
8d01cb88a18a093e3722f2a8a7ccb6db8e39e2723c5522606fe5f8a0d13c72ef
|