Skip to main content

Enhanced tool to concatenate folder contents into markdown format for LLM consumption

Project description

folder2md4llms

Tests Python 3.8+ License: MIT

Enhanced tool to concatenate folder contents into markdown format for LLM consumption, inspired by gpt-repository-loader with significant improvements.

โœจ Features

  • ๐Ÿ“ Markdown-first output - Professional formatting with table of contents, syntax highlighting, and structured sections
  • ๐Ÿ“ Folder structure visualization - ASCII tree representation of directory structure
  • ๐Ÿ“Š Repository statistics - File counts, sizes, and language breakdown
  • ๐Ÿ“„ Document conversion - PDF, DOCX, XLSX files converted to text/markdown
  • ๐Ÿ”ง Binary file analysis - Intelligent descriptions for images, archives, and executables
  • โš™๏ธ Highly configurable - YAML configuration files and comprehensive CLI options
  • ๐Ÿš€ Fast and efficient - Multi-threaded processing with progress tracking
  • ๐Ÿ” Smart filtering - Advanced ignore patterns with glob support and template generation
  • ๐Ÿ“‹ Multiple output formats - Markdown, HTML, and plain text support
  • ๐ŸŒ Cross-platform compatibility - Works seamlessly on Windows, macOS, and Linux

๐Ÿš€ Quick Start

Installation

# Install using uv (recommended)
uv add folder2md4llms

# Or using pip
pip install folder2md4llms

Basic Usage

# Process current directory
folder2md .

# Process specific directory with custom output
folder2md /path/to/repo --output analysis.md

# Skip tree generation and copy to clipboard
folder2md /path/to/repo --no-tree --clipboard

# Verbose mode with custom settings
folder2md /path/to/repo --verbose --max-file-size 2097152

# Generate ignore template file
folder2md --init-ignore

๐Ÿ“– Documentation

๐Ÿ› ๏ธ Development

Setup

# Clone the repository
git clone https://github.com/AI-driven-Optical-Biology-Laboratory/folder2md4llms.git
cd folder2md4llms

# Create virtual environment and install dependencies
uv venv
uv sync --dev

# Install pre-commit hooks
make install-hooks

Development Commands

# Format code
make format

# Run linting
make lint

# Run tests
make test

# Run tests with coverage
make test-cov

# Run all checks
make check

# Run pre-commit on all files
make pre-commit

Testing

# Run all tests
uv run pytest

# Run specific test file
uv run pytest tests/test_cli.py

# Run with coverage
uv run pytest --cov=folder2md4llms --cov-report=term-missing

๐ŸŽฏ Use Cases

  • AI/ML Projects - Prepare codebases for LLM analysis and code review
  • Documentation - Generate comprehensive project overviews
  • Code Analysis - Create structured summaries for large repositories
  • Knowledge Management - Convert project structures into searchable markdown
  • Team Onboarding - Provide new team members with project overviews

๐Ÿ”ง Configuration

Basic Configuration

Create a folder2md.yaml file in your repository:

# Output settings
output_format: markdown
include_tree: true
include_stats: true

# Processing options
convert_docs: true
describe_binaries: true
max_file_size: 1048576  # 1MB

# Document conversion
pdf_max_pages: 50
xlsx_max_sheets: 10

Ignore Patterns

Quick Start with Template

Generate a comprehensive ignore template:

folder2md --init-ignore

This creates a .folder2md_ignore file with common patterns for:

  • Version control systems (git, svn, etc.)
  • Build artifacts and dependencies
  • IDE and editor files
  • OS-generated files
  • Security-sensitive files
  • Large media files
  • Custom patterns section

Manual Creation

You can also create a .folder2md_ignore file manually:

# Version control
.git/
.svn/

# Build artifacts
__pycache__/
*.pyc
node_modules/
build/
dist/

# IDE files
.vscode/
.idea/

# Custom patterns
*.secret
temp/

๐Ÿ“Š Output Format

The generated markdown includes:

  1. ๐Ÿ“‘ Table of Contents - Navigation links to all sections
  2. ๐Ÿ“ Folder Structure - ASCII tree representation
  3. ๐Ÿ“Š Repository Statistics - File counts, sizes, and language breakdown
  4. ๐Ÿ“„ Source Code - Syntax-highlighted code blocks
  5. ๐Ÿ“‹ Documents - Converted document content
  6. ๐Ÿ”ง Binary Files & Assets - Descriptions of non-text files

๐Ÿ”„ Improvements over gptrepo

  • Enhanced Output: Markdown formatting with table of contents and syntax highlighting
  • Document Conversion: PDF, DOCX, XLSX files automatically converted
  • Binary Analysis: Intelligent descriptions for images, archives, and executables
  • Advanced Filtering: Glob patterns and hierarchical ignore rules with template generation
  • Configuration: YAML configuration files and extensive CLI options
  • Performance: Multi-threaded processing with progress tracking
  • Cross-platform: Native support for Windows, macOS, and Linux
  • Extensibility: Modular architecture for easy extension

๐ŸŒ Cross-Platform Support

folder2md4llms works seamlessly across different operating systems:

  • Windows: Full support with automatic dependency management
  • macOS: Optimized for Apple Silicon and Intel processors
  • Linux: Compatible with all major distributions

Platform-Specific Features

  • File Type Detection: Automatic fallback when python-magic is unavailable
  • Path Handling: Consistent behavior across different file systems
  • Dependencies: Platform-specific package management (python-magic vs python-magic-bin)
  • Error Handling: Robust handling of platform-specific file system quirks

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

๐Ÿ‘ค Author

Ricardo Henriques - @ricardohenriques

Email: ricardo@henriqueslab.org

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

folder2md4llms-0.2.0.tar.gz (138.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

folder2md4llms-0.2.0-py3-none-any.whl (31.7 kB view details)

Uploaded Python 3

File details

Details for the file folder2md4llms-0.2.0.tar.gz.

File metadata

  • Download URL: folder2md4llms-0.2.0.tar.gz
  • Upload date:
  • Size: 138.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-httpx/0.28.1

File hashes

Hashes for folder2md4llms-0.2.0.tar.gz
Algorithm Hash digest
SHA256 14b593b149c5efcd1129dd7397d386d8a1684e72bb8af9c059394e1f83243c2c
MD5 092fbe9a60813656b8f5b40229edb138
BLAKE2b-256 21166c5f760a3b2ca4767a0bf847aecf4a09cd9ec6e0061675e8b9c3936646d0

See more details on using hashes here.

File details

Details for the file folder2md4llms-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for folder2md4llms-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2886069310b1629593e3f7086462813956e8e4af342aac6531d3dac9aceac657
MD5 b2909acd14bfeef96acc196a77720972
BLAKE2b-256 8d01cb88a18a093e3722f2a8a7ccb6db8e39e2723c5522606fe5f8a0d13c72ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page