Skip to main content

Enhanced tool to concatenate folder contents into markdown format for LLM consumption

Project description

folder2md4llms

Tests Python 3.8+ License: MIT

Enhanced tool to concatenate folder contents into markdown format for LLM consumption, inspired by gpt-repository-loader with significant improvements.

✨ Features

  • 📝 Markdown-first output - Professional formatting with table of contents, syntax highlighting, and structured sections
  • 📁 Folder structure visualization - ASCII tree representation of directory structure
  • 📊 Repository statistics - File counts, sizes, and language breakdown
  • 📄 Document conversion - PDF, DOCX, XLSX files converted to text/markdown
  • 🔧 Binary file analysis - Intelligent descriptions for images, archives, and executables
  • ⚙️ Highly configurable - YAML configuration files and comprehensive CLI options
  • 🚀 Fast and efficient - Multi-threaded processing with progress tracking
  • 🔍 Smart filtering - Advanced ignore patterns with glob support
  • 📋 Multiple output formats - Markdown, HTML, and plain text support

🚀 Quick Start

Installation

# Install using uv (recommended)
uv add folder2md4llms

# Or using pip
pip install folder2md4llms

Basic Usage

# Process current directory
folder2md .

# Process specific directory with custom output
folder2md /path/to/repo --output analysis.md

# Skip tree generation and copy to clipboard
folder2md /path/to/repo --no-tree --clipboard

# Verbose mode with custom settings
folder2md /path/to/repo --verbose --max-file-size 2097152

📖 Documentation

🛠️ Development

Setup

# Clone the repository
git clone https://github.com/AI-driven-Optical-Biology-Laboratory/folder2md4llms.git
cd folder2md4llms

# Create virtual environment and install dependencies
uv venv
uv sync --dev

# Install pre-commit hooks
make install-hooks

Development Commands

# Format code
make format

# Run linting
make lint

# Run tests
make test

# Run tests with coverage
make test-cov

# Run all checks
make check

# Run pre-commit on all files
make pre-commit

Testing

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=folder2md4llms --cov-report=term-missing

🎯 Use Cases

  • AI/ML Projects - Prepare codebases for LLM analysis and code review
  • Documentation - Generate comprehensive project overviews
  • Code Analysis - Create structured summaries for large repositories
  • Knowledge Management - Convert project structures into searchable markdown
  • Team Onboarding - Provide new team members with project overviews

🔧 Configuration

Basic Configuration

Create a folder2md.yaml file in your repository:

# Output settings
output_format: markdown
include_tree: true
include_stats: true

# Processing options
convert_docs: true
describe_binaries: true
max_file_size: 1048576  # 1MB

# Document conversion
pdf_max_pages: 50
xlsx_max_sheets: 10

Ignore Patterns

Create a .folder2md_ignore file:

# Version control
.git/
.svn/

# Build artifacts
__pycache__/
*.pyc
node_modules/
build/
dist/

# IDE files
.vscode/
.idea/

# Custom patterns
*.secret
temp/

📊 Output Format

The generated markdown includes:

  1. 📑 Table of Contents - Navigation links to all sections
  2. 📁 Folder Structure - ASCII tree representation
  3. 📊 Repository Statistics - File counts, sizes, and language breakdown
  4. 📄 Source Code - Syntax-highlighted code blocks
  5. 📋 Documents - Converted document content
  6. 🔧 Binary Files & Assets - Descriptions of non-text files

🔄 Improvements over gptrepo

  • Enhanced Output: Markdown formatting with table of contents and syntax highlighting
  • Document Conversion: PDF, DOCX, XLSX files automatically converted
  • Binary Analysis: Intelligent descriptions for images, archives, and executables
  • Advanced Filtering: Glob patterns and hierarchical ignore rules
  • Configuration: YAML configuration files and extensive CLI options
  • Performance: Multi-threaded processing with progress tracking
  • Extensibility: Modular architecture for easy extension

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

👤 Author

Ricardo Henriques - @ricardohenriques

Email: ricardo@henriqueslab.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folder2md4llms-0.1.1.tar.gz (132.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folder2md4llms-0.1.1-py3-none-any.whl (27.6 kB view details)

Uploaded Python 3

File details

Details for the file folder2md4llms-0.1.1.tar.gz.

File metadata

  • Download URL: folder2md4llms-0.1.1.tar.gz
  • Upload date:
  • Size: 132.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for folder2md4llms-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c531e57d2801961ca30da9d3cb0c99561a93dfd4d989ee221a4226275fdf3c5c
MD5 24ab6ef9e487795f775cffcecee35635
BLAKE2b-256 5727384c81a599aed38e371059edf3ade867b720fe5f7c83ce7ebf5ee8563bbf

See more details on using hashes here.

File details

Details for the file folder2md4llms-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for folder2md4llms-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 78ac5e9134a4ba6b0e372a662569a38a1d124104e9ddd834fc55d2cb013f659b
MD5 8a30663a6ec67546e0380f115200f5de
BLAKE2b-256 f0cbbd69f2596cedeee44ea2080e5400f82302bd47a29f907e28625703f14fe4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page