Skip to main content

Enhanced tool to concatenate folder contents into markdown format for LLM consumption

Project description

folder2md4llms

Tests Python 3.8+ License: MIT

Enhanced tool to concatenate folder contents into markdown format for LLM consumption, inspired by gpt-repository-loader with significant improvements.

โœจ Features

  • ๐Ÿ“ Markdown-first output - Professional formatting with table of contents, syntax highlighting, and structured sections
  • ๐Ÿ“ Folder structure visualization - ASCII tree representation of directory structure
  • ๐Ÿ“Š Repository statistics - File counts, sizes, and language breakdown
  • ๐Ÿ“„ Document conversion - PDF, DOCX, XLSX files converted to text/markdown
  • ๐Ÿ”ง Binary file analysis - Intelligent descriptions for images, archives, and executables
  • โš™๏ธ Highly configurable - YAML configuration files and comprehensive CLI options
  • ๐Ÿš€ Fast and efficient - Multi-threaded processing with progress tracking
  • ๐Ÿ” Smart filtering - Advanced ignore patterns with glob support and template generation
  • ๐Ÿ“‹ Multiple output formats - Markdown, HTML, and plain text support
  • ๐ŸŒ Cross-platform compatibility - Works seamlessly on Windows, macOS, and Linux

๐Ÿš€ Quick Start

Installation

# Install using uv (recommended)
uv add folder2md4llms

# Or using pip
pip install folder2md4llms

Basic Usage

# Process current directory
folder2md .

# Process specific directory with custom output
folder2md /path/to/repo --output analysis.md

# Skip tree generation and copy to clipboard
folder2md /path/to/repo --no-tree --clipboard

# Verbose mode with custom settings
folder2md /path/to/repo --verbose --max-file-size 2097152

# Generate ignore template file
folder2md --init-ignore

๐Ÿ“– Documentation

๐Ÿ› ๏ธ Development

Setup

# Clone the repository
git clone https://github.com/AI-driven-Optical-Biology-Laboratory/folder2md4llms.git
cd folder2md4llms

# Create virtual environment and install dependencies
uv venv
uv sync --dev

# Install pre-commit hooks
make install-hooks

Development Commands

# Format code
make format

# Run linting
make lint

# Run tests
make test

# Run tests with coverage
make test-cov

# Run all checks
make check

# Run pre-commit on all files
make pre-commit

Testing

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=folder2md4llms --cov-report=term-missing

๐ŸŽฏ Use Cases

  • AI/ML Projects - Prepare codebases for LLM analysis and code review
  • Documentation - Generate comprehensive project overviews
  • Code Analysis - Create structured summaries for large repositories
  • Knowledge Management - Convert project structures into searchable markdown
  • Team Onboarding - Provide new team members with project overviews

๐Ÿ”ง Configuration

Basic Configuration

Create a folder2md.yaml file in your repository:

# Output settings
output_format: markdown
include_tree: true
include_stats: true

# Processing options
convert_docs: true
describe_binaries: true
max_file_size: 1048576  # 1MB

# Document conversion
pdf_max_pages: 50
xlsx_max_sheets: 10

Ignore Patterns

Quick Start with Template

Generate a comprehensive ignore template:

folder2md --init-ignore

This creates a .folder2md_ignore file with common patterns for:

  • Version control systems (git, svn, etc.)
  • Build artifacts and dependencies
  • IDE and editor files
  • OS-generated files
  • Security-sensitive files
  • Large media files
  • Custom patterns section

Manual Creation

You can also create a .folder2md_ignore file manually:

# Version control
.git/
.svn/

# Build artifacts
__pycache__/
*.pyc
node_modules/
build/
dist/

# IDE files
.vscode/
.idea/

# Custom patterns
*.secret
temp/

๐Ÿ“Š Output Format

The generated markdown includes:

  1. ๐Ÿ“‘ Table of Contents - Navigation links to all sections
  2. ๐Ÿ“ Folder Structure - ASCII tree representation
  3. ๐Ÿ“Š Repository Statistics - File counts, sizes, and language breakdown
  4. ๐Ÿ“„ Source Code - Syntax-highlighted code blocks
  5. ๐Ÿ“‹ Documents - Converted document content
  6. ๐Ÿ”ง Binary Files & Assets - Descriptions of non-text files

๐Ÿ”„ Improvements over gptrepo

  • Enhanced Output: Markdown formatting with table of contents and syntax highlighting
  • Document Conversion: PDF, DOCX, XLSX files automatically converted
  • Binary Analysis: Intelligent descriptions for images, archives, and executables
  • Advanced Filtering: Glob patterns and hierarchical ignore rules with template generation
  • Configuration: YAML configuration files and extensive CLI options
  • Performance: Multi-threaded processing with progress tracking
  • Cross-platform: Native support for Windows, macOS, and Linux
  • Extensibility: Modular architecture for easy extension

๐ŸŒ Cross-Platform Support

folder2md4llms works seamlessly across different operating systems:

  • Windows: Full support with automatic dependency management
  • macOS: Optimized for Apple Silicon and Intel processors
  • Linux: Compatible with all major distributions

Platform-Specific Features

  • File Type Detection: Automatic fallback when python-magic is unavailable
  • Path Handling: Consistent behavior across different file systems
  • Dependencies: Platform-specific package management (python-magic vs python-magic-bin)
  • Error Handling: Robust handling of platform-specific file system quirks

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ‘ค Author

Ricardo Henriques - @ricardohenriques

Email: ricardo@henriqueslab.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folder2md4llms-0.3.0.tar.gz (56.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folder2md4llms-0.3.0-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file folder2md4llms-0.3.0.tar.gz.

File metadata

  • Download URL: folder2md4llms-0.3.0.tar.gz
  • Upload date:
  • Size: 56.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for folder2md4llms-0.3.0.tar.gz
Algorithm Hash digest
SHA256 38e65cdddf686e9693adc253500067df669311c66fab37b6253c43457d83f8fd
MD5 ea3dbedaac86b43e33079a368651b8cd
BLAKE2b-256 9ed3bac84e96b27ebdb0d0ae4ffb7f7865127c4d9ce06b5397db7c883b4c79e1

See more details on using hashes here.

File details

Details for the file folder2md4llms-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: folder2md4llms-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for folder2md4llms-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e170cca00e6ec9b6398ef839201123f880069a3f3f3eb511c5d270916e13c993
MD5 febdcca702f466f93a9dff9f7d198b36
BLAKE2b-256 f2bfd00c4b6d20605f6783c5d12ca61948b3d9626332437b8c19b30e39b112bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page