Skip to main content

A tool to format repository content into a single Markdown file.

Project description

Repository Formatter

A command-line tool to format repository content into a single Markdown file (repository.md). It includes options for filtering, anonymization, and different processing modes.

  • There is a ton of other options for this.

PyPI version

Features

  • Generates a Markdown file with repository structure and file contents.
  • Filters files/directories based on paths and extensions via a config file.
  • Anonymizes specified strings in file paths and content.
  • Supports different modes:
    • normal: Process the entire repository (respecting filters).
    • class: Include only files containing a specific class name.
    • patch: Include a git diff instead of file contents.
  • Estimates the token count of the generated Markdown using tiktoken.
  • Configurable via a .repo_formatter.yaml file.

Installation

From PyPI (Recommended):

pip install repo-formatter

From Source (for Development):

  1. Clone the repository:
    git clone https://github.com/your-username/repo-formatter.git # <-- UPDATE URL
    cd repo-formatter
    
  2. Install in editable mode (includes development dependencies):
    pip install -e .[dev]
    

Usage

repo-formatter [OPTIONS] [DIRECTORY]

Arguments:

  • DIRECTORY: The path to the repository/directory to process (default: current directory).

Options:

  • -m MODE, --mode MODE: Processing mode (normal, class, patch). Default: normal.
  • --class-name NAME: Required for class mode. The name to search for.
  • --diff-target TARGET: Required for patch mode. Use current for uncommitted changes, or a git ref/range (e.g., main, HEAD~2, v1.0..v1.1).
  • -c PATH, --config PATH: Path to the YAML configuration file. If not provided, searches for .repo_formatter.yaml in the target directory and its parents.
  • -a, --anonymize: Enable anonymization using rules from the config file.
  • -o FILENAME, --output FILENAME: Output Markdown filename (default: repository.md, or mode-specific names like diff_....md, class_....md).

Examples:

# Process the current directory with default settings
repo-formatter

# Process a specific directory
repo-formatter ../my-other-project

# Use a specific config file and enable anonymization
repo-formatter -c /path/to/my_config.yaml -a .

# Find all files containing "UserManager"
repo-formatter --mode class --class-name UserManager

# Get uncommitted changes as a patch file (diff_current.md)
repo-formatter --mode patch --diff-target current

# Get the diff between 'develop' branch and 'main' branch (diff_develop..main.md)
repo-formatter --mode patch --diff-target develop..main

Configuration (.repo_formatter.yaml)

Create a .repo_formatter.yaml file in the root of your repository (or specify with -c).

# Paths to exclude, relative to the repository root.
# This matches the full path, so 'data' excludes the 'data' directory at the root,
# and 'app/logs' excludes 'logs' inside 'app'.
exclude_paths:
  - .git
  - .vscode
  - node_modules
  - build
  - dist
  - venv
  - __pycache__
  - specific_file_to_ignore.log
  - app/content # Excludes the 'content' directory inside 'app'

# Force the inclusion of specific files or directories, even if they are in an excluded path.
# This is useful for including a specific file from an otherwise excluded directory.
force_include:
  - docs/IMPORTANT.md # Include this file even if 'docs' is in exclude_paths.

# List of file extensions to include (lowercase, including the dot).
# If empty or omitted, all extensions (not excluded by path) are included.
include_extensions:
  - .py
  - .js
  - .html
  - .css
  - .md

# Dictionary of strings to anonymize (case-insensitive keys).
# The replacement value's case will try to mimic the original match.
anonymize:
  "CompanyName": "ClientProject"
  "internal_api_key": "REDACTED_KEY"
  "ProjectX": "CodenameZephyr"

Development (using Devcontainer)

  1. Make sure you have Docker and the VS Code "Dev Containers" extension installed.
  2. Open the repo-formatter folder in VS Code.
  3. When prompted, click "Reopen in Container".
  4. VS Code will build the container and install dependencies.
  5. You can now run/debug the tool within the isolated container environment. The terminal in VS Code will be inside the container.
# Inside the devcontainer terminal
repo-formatter --help
repo-formatter sample_project # Test on the sample project
repo-formatter sample_project -a # Test anonymization
# Initialize git in sample_project to test patch mode
cd sample_project
git init
git add .
git commit -m "Initial commit"
echo "// New comment" >> src/main.cpp
cd ..
repo-formatter sample_project --mode patch --diff-target current

Token Estimation

The tool uses the tiktoken library to provide accurate token counts for OpenAI models. The library is installed automatically as a dependency.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_formatter-0.2.0.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

repo_formatter-0.2.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file repo_formatter-0.2.0.tar.gz.

File metadata

  • Download URL: repo_formatter-0.2.0.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for repo_formatter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ff765d57147762d95f304380dc4d715f3f15cab1a16b1f4e2442a30cf3a1fe52
MD5 544ba1288f4c354785c8ec6d5ec06204
BLAKE2b-256 24263e684d4697c2dd3ef7f0129ac0b53786b269c7e628102fa3bea1a9fdb34f

See more details on using hashes here.

File details

Details for the file repo_formatter-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: repo_formatter-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for repo_formatter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 981211f73f68a3ca51dee346de05fb12bc4dadeba496a90e8ae67894eed7a7db
MD5 a977210f840f8d1ea9d13a5e722058d3
BLAKE2b-256 29f6e84878af8a0f3977c9ddde289dc87068d336b129a8e8c58d1a91e5b526c9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page