Skip to main content

Transform Dirs or Git repositories into comprehensive Markdown documentation with intelligent file filtering and structure preservation.

Project description

Project2MD

Transform Git repositories into comprehensive Markdown documentation with intelligent file filtering and structure preservation.

Overview

project2md is a command-line tool that creates a single Markdown file containing the complete structure and content of a Git repository. It's designed to prepare repository content for Large Language Model (LLM) analysis while maintaining project structure and context.

Features

Core Features (v1.0.0)

  • Clone Git repositories using SSH authentication
  • Process existing local repositories
  • Intelligent file filtering with glob patterns
  • Configurable directory depth limits
  • Text file content extraction
  • Repository structure visualization
  • Project statistics
  • Progress tracking
  • Branch information

Planned Features

  • Additional output formats (JSON)
  • Enhanced tree visualization
  • Extended Git metadata support

Installation

pip install project2md

Usage

Basic Usage

# Process a remote repository
project2md --repo=https://github.com/user/repo --output=summary.md

# Process current directory
project2md --output=summary.md

# Use specific configuration
project2md --repo=https://github.com/user/repo --config=.project2md.yml

Command Line Arguments

--repo        Repository URL (optional, defaults to current directory)
--target      Clone target directory (optional, defaults to current directory)
--output      Output file path (optional, defaults to project_summary.md)
--config      Configuration file path (optional, defaults to .project2md.yml)
--include     Include patterns (can be specified multiple times)
--exclude     Exclude patterns (can be specified multiple times)

Configuration File (.project2md.yml)

general:
  max_depth: 10
  max_file_size: 1MB
  stats_in_output: true
  collapse_empty_dirs: true
output:
  format: markdown
  stats: true
include:
  files:
    - "*.md"
    - "*.py"
    - "src/**/*.js"
  dirs:
    - "src/"
    - "lib/"
exclude:
  files:
    - "*.test.js"
    - "*.spec.ts"
    - "**/node_modules/**"
  dirs:
    - "dist/"
    - "build/"
    - "coverage/"

Output Format

The generated Markdown file follows this structure:

# Project Overview

project README.md content:

```markdown
{README.md content}

project file- and folder tree:

{project tree}

Project Statistics

{statistics if enabled}

File Contents

filepath repoRoot/file1

{file1 content}

filepath repoRoot/dir/file2

{file2 content}

Development

Setting Up Development Environment

  1. Install Poetry (if not already installed):

    curl -sSL https://install.python-poetry.org | python3 -
    
  2. Clone the repository:

    git clone https://github.com/itsatony/project2md.git
    cd project2md
    
  3. Install dependencies with Poetry:

    poetry install
    

Running Tests

The project uses pytest for testing. To run the tests:

# Run all tests
poetry run pytest

# Run tests with coverage report
poetry run pytest --cov=project2md

# Run tests verbosely
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_config.py

# Run tests matching specific pattern
poetry run pytest -k "test_config"

Test Structure

Tests are organized in the tests/ directory:

  • test_config.py: Configuration system tests
  • test_git.py: Git operations tests
  • test_walker.py: File system traversal tests
  • test_formatter.py: Output formatting tests
  • test_stats.py: Statistics collection tests

Project Structure

project2md/
├── __init__.py          # Package initialization
├── cli.py              # Command-line interface
├── config.py           # Configuration handling
├── git.py             # Git operations
├── walker.py          # File system traversal
├── formatter.py       # Output formatting
├── stats.py          # Statistics collection
└── utils.py          # Shared utilities

Component Responsibilities

CLI (cli.py)

  • Parse command-line arguments
  • Initialize configuration
  • Orchestrate overall process flow
  • Handle user interaction (progress bar)

Configuration (config.py)

  • Parse YAML configuration
  • Merge CLI arguments with config file
  • Validate configuration
  • Provide unified config interface

Git Operations (git.py)

  • Clone repositories
  • Validate repository status
  • Extract branch information
  • Handle SSH authentication

File System Walker (walker.py)

  • Traverse directory structure
  • Apply include/exclude patterns
  • Handle file size limits
  • Manage directory depth
  • Detect binary files

Formatter (formatter.py)

  • Generate Markdown output
  • Create directory tree visualization
  • Format statistics
  • Handle alternative output formats

Statistics (stats.py)

  • Collect file and directory statistics
  • Calculate size metrics
  • Track file types
  • Generate statistical summaries

Utilities (utils.py)

  • Shared helper functions
  • Error handling utilities
  • Progress tracking
  • Logging

Error Handling

The tool implements comprehensive error handling:

  • Clear error messages for configuration issues
  • Graceful handling of inaccessible files
  • Recovery from non-critical errors
  • Detailed logging in verbose mode

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project2md-1.0.2.tar.gz (20.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project2md-1.0.2-py3-none-any.whl (22.7 kB view details)

Uploaded Python 3

File details

Details for the file project2md-1.0.2.tar.gz.

File metadata

  • Download URL: project2md-1.0.2.tar.gz
  • Upload date:
  • Size: 20.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for project2md-1.0.2.tar.gz
Algorithm Hash digest
SHA256 ef346d6846748b48c0a1cf42c3866fcea6f7edf409af3807dee1744eb4004212
MD5 2f545a1ddafbdf73b671b561455c4439
BLAKE2b-256 3af679ab6e9641793da5967dc5700e33cd47fd8d97ecee89301e90ca9006fd67

See more details on using hashes here.

File details

Details for the file project2md-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: project2md-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 22.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for project2md-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 99a8861fddc4eee4b1755c6c47c7a0e6c0c8e35c375ada6b483690fac1e54bc9
MD5 2941558c562fb16ea6d4429370880d4d
BLAKE2b-256 8e208fce0a16d532ef95f74585709c237436518816d9c1b93c1364c1a1353bb8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page