Enhanced tool to concatenate folder contents into markdown format for LLM consumption
Project description
folder2md4llms
Enhanced tool to concatenate folder contents into markdown format for LLM consumption, inspired by gpt-repository-loader with significant improvements.
✨ Features
- 📝 Markdown-first output - Professional formatting with table of contents, syntax highlighting, and structured sections
- 📁 Folder structure visualization - ASCII tree representation of directory structure
- 📊 Repository statistics - File counts, sizes, and language breakdown
- 📄 Document conversion - PDF, DOCX, XLSX files converted to text/markdown
- 🔧 Binary file analysis - Intelligent descriptions for images, archives, and executables
- ⚙️ Highly configurable - YAML configuration files and comprehensive CLI options
- 🚀 Fast and efficient - Multi-threaded processing with progress tracking
- 🔍 Smart filtering - Advanced ignore patterns with glob support
- 📋 Multiple output formats - Markdown, HTML, and plain text support
🚀 Quick Start
Installation
# Install using uv (recommended)
uv add folder2md4llms
# Or using pip
pip install folder2md4llms
Basic Usage
# Process current directory
folder2md .
# Process specific directory with custom output
folder2md /path/to/repo --output analysis.md
# Skip tree generation and copy to clipboard
folder2md /path/to/repo --no-tree --clipboard
# Verbose mode with custom settings
folder2md /path/to/repo --verbose --max-file-size 2097152
📖 Documentation
- API Documentation - Complete API reference
- Configuration Guide - Configuration options and examples
- File Type Support - Supported file formats
🛠️ Development
Setup
# Clone the repository
git clone https://github.com/AI-driven-Optical-Biology-Laboratory/folder2md4llms.git
cd folder2md4llms
# Create virtual environment and install dependencies
uv venv
uv sync --dev
# Install pre-commit hooks
make install-hooks
Development Commands
# Format code
make format
# Run linting
make lint
# Run tests
make test
# Run tests with coverage
make test-cov
# Run all checks
make check
# Run pre-commit on all files
make pre-commit
Testing
# Run all tests
uv run pytest
# Run specific test file
uv run pytest tests/test_cli.py
# Run with coverage
uv run pytest --cov=folder2md4llms --cov-report=term-missing
🎯 Use Cases
- AI/ML Projects - Prepare codebases for LLM analysis and code review
- Documentation - Generate comprehensive project overviews
- Code Analysis - Create structured summaries for large repositories
- Knowledge Management - Convert project structures into searchable markdown
- Team Onboarding - Provide new team members with project overviews
🔧 Configuration
Basic Configuration
Create a folder2md.yaml file in your repository:
# Output settings
output_format: markdown
include_tree: true
include_stats: true
# Processing options
convert_docs: true
describe_binaries: true
max_file_size: 1048576 # 1MB
# Document conversion
pdf_max_pages: 50
xlsx_max_sheets: 10
Ignore Patterns
Create a .folder2md_ignore file:
# Version control
.git/
.svn/
# Build artifacts
__pycache__/
*.pyc
node_modules/
build/
dist/
# IDE files
.vscode/
.idea/
# Custom patterns
*.secret
temp/
📊 Output Format
The generated markdown includes:
- 📑 Table of Contents - Navigation links to all sections
- 📁 Folder Structure - ASCII tree representation
- 📊 Repository Statistics - File counts, sizes, and language breakdown
- 📄 Source Code - Syntax-highlighted code blocks
- 📋 Documents - Converted document content
- 🔧 Binary Files & Assets - Descriptions of non-text files
🔄 Improvements over gptrepo
- Enhanced Output: Markdown formatting with table of contents and syntax highlighting
- Document Conversion: PDF, DOCX, XLSX files automatically converted
- Binary Analysis: Intelligent descriptions for images, archives, and executables
- Advanced Filtering: Glob patterns and hierarchical ignore rules
- Configuration: YAML configuration files and extensive CLI options
- Performance: Multi-threaded processing with progress tracking
- Extensibility: Modular architecture for easy extension
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
- Inspired by gpt-repository-loader by mpoon
- Built with modern Python tooling: uv, ruff, pytest
👤 Author
Ricardo Henriques - @ricardohenriques
Email: ricardo@henriqueslab.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file folder2md4llms-0.1.1.tar.gz.
File metadata
- Download URL: folder2md4llms-0.1.1.tar.gz
- Upload date:
- Size: 132.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c531e57d2801961ca30da9d3cb0c99561a93dfd4d989ee221a4226275fdf3c5c
|
|
| MD5 |
24ab6ef9e487795f775cffcecee35635
|
|
| BLAKE2b-256 |
5727384c81a599aed38e371059edf3ade867b720fe5f7c83ce7ebf5ee8563bbf
|
File details
Details for the file folder2md4llms-0.1.1-py3-none-any.whl.
File metadata
- Download URL: folder2md4llms-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-httpx/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78ac5e9134a4ba6b0e372a662569a38a1d124104e9ddd834fc55d2cb013f659b
|
|
| MD5 |
8a30663a6ec67546e0380f115200f5de
|
|
| BLAKE2b-256 |
f0cbbd69f2596cedeee44ea2080e5400f82302bd47a29f907e28625703f14fe4
|