Skip to main content

Transform directories or git repositories into comprehensive Markdown documentation with intelligent file filtering and structure preservation.

Project description

Project2MD

Transform Git repositories into comprehensive Markdown documentation with intelligent file filtering and structure preservation.

Overview

project2md is a command-line tool that creates a single Markdown file containing the complete structure and content of a Git repository. It's designed to prepare repository content for Large Language Model (LLM) analysis while maintaining project structure and context.

Features

Core Features (v1.3.0)

  • Signature extraction mode - Extract only function signatures and headers for high-level code overview
  • Multiple output formats (Markdown, JSON, YAML)
  • Clone Git repositories using SSH authentication
  • Process existing local repositories
  • Configuration file support (.project2md.yml)
  • Project initialization with default config
  • Intelligent file filtering with glob patterns
  • Configurable directory depth limits
  • Text file content extraction
  • Repository structure visualization
  • Project statistics
  • Progress tracking
  • Branch information
  • Smart defaults for common file patterns
  • Draft file exclusion (__*.md)
  • Gitignore integration

Signature Extraction

The new --signatures flag transforms how code files are processed:

  • Code files: Extracts function signatures, class definitions, and method signatures with line counts
  • Markdown files: Keeps only headers with section line counts
  • Config files: Shows only line count for YAML, JSON, TOML, INI, and other configuration files
  • Text files: Shows only line count for .txt, .log, .csv, and similar files
  • Empty code files: Shows "empty" instead of empty code blocks
  • Supported languages: Python, JavaScript, TypeScript, Java, C/C++, C#, Go, Rust, PHP, Ruby

Example output with --signatures:

def add_numbers(a, b): [lines:3]
class Calculator: [lines:15]
async def process_async(items: List[str]) -> bool: [lines:8]

Config and text files in signatures mode:

[lines:25]

Empty code files:

empty

Planned Features

  • Enhanced tree visualization
  • Extended Git metadata support

Installation

pip install project2md

Usage

Initialization

# Initialize project with default configuration
project2md init

# Initialize in specific directory
project2md init --root-dir /path/to/project

# Force overwrite existing config
project2md init --force

Processing Repositories

# Process a remote repository
project2md process --repo=https://github.com/user/repo --output=summary.md

# Process current directory
project2md process --output=summary.md

# Extract only signatures for a high-level overview
project2md process --signatures --output=signatures.md

# Use specific configuration
project2md process --repo=https://github.com/user/repo --config=.project2md.yml

Command Line Arguments

Global Options

init        Initialize project with default configuration
process     Process a repository or directory
explicit    Generate explicit configuration file
version     Show version information

Init Command Options

--root-dir  Root directory for initialization (defaults to current directory)
--force     Overwrite existing config file

Process Command Options

--repo        Repository URL (optional, defaults to current directory)
--target      Clone target directory (optional, defaults to current directory)
--output      Output file path (optional, defaults to project_summary.md)
--config      Configuration file path (optional, defaults to .project2md.yml)
--include     Include patterns (can be specified multiple times)
--exclude     Exclude patterns (can be specified multiple times)
--branch      Specific branch to process (defaults to 'main')
--format      Output format: markdown, json, yaml (default: markdown)
--signatures  Extract only function signatures and headers

Explicit Command Options

--directory   Directory to analyze (defaults to current directory)

Configuration File (.project2md.yml)

The tool automatically creates this file when you run project2md init. It includes:

general:
  max_depth: 10
  max_file_size: "1MB"
  stats_in_output: true
  collapse_empty_dirs: true

output:
  format: "markdown"
  stats: true

include:
  files:
    - "**/*.py"         # Python files
    - "**/*.js"         # JavaScript files
    - "**/*.md"         # Markdown files
    # ... many more defaults for common file types
  dirs:
    - "src/"
    - "lib/"
    - "app/"
    - "tests/"
    - "docs/"

exclude:
  files:
    - "project_summary.md"  # Default output file
    - ".project2md.yml"     # Config file
    - "**/__*.md"          # Draft markdown files
    - "**/.git/**"         # Git files
    # ... many more sensible defaults
  dirs:
    - ".git"
    - "node_modules"
    - "venv"
    # ... more excluded directories

Output Format

The generated Markdown file follows this structure:

# Project Overview

{README.md content}

# Project Structure

```tree
{project tree}
```

# Statistics

{detailed statistics if enabled}

# File Contents

## filepath: repoRoot/file1

{file1 content}

## filepath: repoRoot/dir/file2

{file2 content}

Signature Mode Output

When using --signatures, the output focuses on structure:

# File Contents

## filepath: repoRoot/main.py
def main(args): [lines:15]
class Application: [lines:45]
async def startup(): [lines:8]

## filepath: repoRoot/README.md
# Main Title [lines:3]
## Installation [lines:12]
### Prerequisites [lines:5]
## Usage [lines:25]

Development

Setting Up Development Environment

  1. Install Poetry (if not already installed):

    curl -sSL https://install.python-poetry.org | python3 -
    
  2. Clone the repository:

    git clone https://github.com/itsatony/project2md.git
    cd project2md
    
  3. Install dependencies with Poetry:

    poetry install
    

Running Tests

The project uses pytest for testing. To run the tests:

# Run all tests
poetry run pytest

# Run tests with coverage report
poetry run pytest --cov=project2md

# Run tests verbosely
poetry run pytest -v

# Run specific test file
poetry run pytest tests/test_config.py

# Run tests matching specific pattern
poetry run pytest -k "test_config"

Test Structure

Tests are organized in the tests/ directory:

  • test_config.py: Configuration system tests
  • test_git.py: Git operations tests
  • test_walker.py: File system traversal tests
  • test_formatter.py: Output formatting tests
  • test_stats.py: Statistics collection tests
  • test_signature_processor.py: Signature extraction tests
  • test_cli_signatures.py: CLI integration tests for signatures

Project Structure

project2md/
├── __init__.py                  # Package initialization
├── cli.py                      # Command-line interface
├── config.py                   # Configuration handling
├── git.py                     # Git operations
├── walker.py                  # File system traversal
├── signature_processor.py     # Signature extraction (NEW)
├── formatters/                # Output formatting
│   ├── base.py               # Base formatter
│   ├── factory.py            # Formatter factory
│   ├── markdown.py           # Markdown formatter
│   ├── json.py               # JSON formatter
│   └── yaml.py               # YAML formatter
├── stats.py                  # Statistics collection
├── messages.py               # User messaging
├── explicit_config_generator.py  # Explicit config generation
└── utils.py                  # Shared utilities

Component Responsibilities

CLI (cli.py)

  • Parse command-line arguments
  • Initialize configuration
  • Orchestrate overall process flow
  • Handle user interaction (progress bar)

Configuration (config.py)

  • Parse YAML configuration
  • Merge CLI arguments with config file
  • Validate configuration
  • Provide unified config interface

Signature Processor (signature_processor.py)

  • Extract function signatures from code files
  • Process markdown headers with line counts
  • Support multiple programming languages
  • Handle syntax errors gracefully

Git Operations (git.py)

  • Clone repositories
  • Validate repository status
  • Extract branch information
  • Handle SSH authentication

File System Walker (walker.py)

  • Traverse directory structure
  • Apply include/exclude patterns
  • Handle file size limits
  • Manage directory depth
  • Detect binary files

Formatters (formatters/)

  • Generate output in multiple formats
  • Create directory tree visualization
  • Format statistics
  • Handle file content rendering

Statistics (stats.py)

  • Collect file and directory statistics
  • Calculate size metrics
  • Track file types
  • Generate statistical summaries

Utilities (utils.py)

  • Shared helper functions
  • Error handling utilities
  • Progress tracking
  • Logging

Error Handling

The tool implements comprehensive error handling:

  • Clear error messages for configuration issues
  • Graceful handling of inaccessible files
  • Recovery from non-critical errors
  • Detailed logging in verbose mode

Contributing

Contributions are welcome! Please read our contributing guidelines before submitting pull requests.

Version History

v1.3.5 (Latest)

  • FIX: Version reporting now works correctly for installed packages
  • FIX: Version command now shows correct version from package metadata
  • NEW: Version is displayed in all console output and generated documentation
  • Improved version detection using importlib.metadata for installed packages
  • Enhanced fallback to pyproject.toml for development environments

v1.3.4

  • NEW: Enhanced signature extraction mode with intelligent file handling
  • NEW: Config files (YAML, JSON, TOML, INI, etc.) show only line count in signatures mode
  • NEW: Text files (.txt, .log, .csv, etc.) show only line count in signatures mode
  • NEW: Empty code files display "empty" instead of empty code blocks
  • NEW: Comprehensive test suite for improved signature processing
  • Improved file type detection and processing logic
  • Better handling of various file extensions in signatures mode

v1.3.1

  • NEW: Signature extraction mode with --signatures flag
  • NEW: Support for extracting function signatures from code files
  • NEW: Markdown header extraction with line counts
  • NEW: Multi-language support (Python, JS, TS, Java, C/C++, C#, Go, Rust, PHP, Ruby)
  • NEW: Comprehensive test suite for signature processing
  • Enhanced CLI with signature processing integration
  • Improved configuration system for signature mode

v1.2.2

  • Added a dedicated version command
  • Updated CLI to show help upon parsing errors without the default "Try ..." message

v1.2.1

  • Added "explicit" CLI command that generates explicit.config.project2md.yml, listing all files/dirs with per-item size info and their default inclusion status

v1.1.0

  • Added init command for project initialization
  • Improved configuration file handling
  • Added draft markdown exclusion (__*.md)
  • Enhanced default file patterns
  • Added config file auto-detection
  • Improved documentation
  • Better error messages
  • Smarter default configurations

CLI Help

When no command or invalid arguments are provided, project2md now shows usage information by default. Use:

project2md --help

to see all available options.

License

MIT License - see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

project2md-1.3.5.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

project2md-1.3.5-py3-none-any.whl (39.0 kB view details)

Uploaded Python 3

File details

Details for the file project2md-1.3.5.tar.gz.

File metadata

  • Download URL: project2md-1.3.5.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for project2md-1.3.5.tar.gz
Algorithm Hash digest
SHA256 b8d52c3e705a8f97357a196438b960fe2389e216d2664a36941a07ce5fd83a54
MD5 8e20be1f1bba1ef80e071dff608dad86
BLAKE2b-256 d199c49030b24b8d7a254c8235ae93e2b24fa69be57c9f2cd743406208ea60f1

See more details on using hashes here.

File details

Details for the file project2md-1.3.5-py3-none-any.whl.

File metadata

  • Download URL: project2md-1.3.5-py3-none-any.whl
  • Upload date:
  • Size: 39.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for project2md-1.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f7b279052f56f7f23bd91a742237bd0f83808f0048a0b1e75d2f45f745dfa0ac
MD5 3f8a6297f87ceeaebdd5a9563f29e08c
BLAKE2b-256 80b6b21359ce66052c4e51a3e88a276cb3662f37c321ea063e4e0b5b1bd8b754

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page