Skip to main content

A tool to convert code repositories into text format for LLM context

Project description

Repo to Single File

A command-line tool that converts code repositories into text format, making them suitable for use as context in Large Language Models (LLMs). Supports both local repositories and GitHub remote repositories.

Features

  • Convert local Git repositories to text format
  • Convert GitHub repositories to text format (public and private)
  • Process specific subfolders in monorepos
  • Respect .gitignore patterns for local repositories
  • Skip binary files automatically
  • Structured output with clear file demarcation
  • Token counting with OpenAI tokenizer
  • Cost estimation for GPT-3.5 and GPT-4

Installation

pip install repo-to-singlefile

Usage

Basic Usage

  1. Convert a local repository:
repo-to-singlefile /path/to/local/repo output.txt
  1. Convert a public GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt
  1. Convert a private GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt --github-token YOUR_GITHUB_TOKEN

Monorepo Support

Process only specific subfolders in a repository:

  1. Local monorepo:
repo-to-singlefile /path/to/repo output.txt --subfolder packages/mylib
  1. GitHub monorepo:
repo-to-singlefile https://github.com/owner/repo output.txt --subfolder packages/mylib

Output Format

The generated text file contains the contents of all text files in the repository, with clear headers separating each file:

### File: src/main.py ###
[content of main.py]

### File: src/utils.py ###
[content of utils.py]

...

After processing, you'll see a summary that includes:

  • Total token count
  • Total character count
  • Estimated costs for GPT-3.5 and GPT-4 usage

Example summary:

==================================================
CONVERSION SUMMARY
==================================================
Total tokens: 15,234
Total characters: 45,678

Estimated costs (based on current OpenAI pricing):
GPT-4:
  - Input cost: $0.46
  - Output cost: $0.91
GPT-3.5:
  - Input cost: $0.02
  - Output cost: $0.03
==================================================

Configuration

The tool automatically:

  • Respects .gitignore patterns in local repositories
  • Skips binary files
  • Processes common text file extensions:
    • Python (.py)
    • JavaScript (.js)
    • Java (.java)
    • C++ (.cpp, .h)
    • Web (.html, .css)
    • Documentation (.md)
    • Config files (.yml, .yaml, .json)
    • Shell scripts (.sh)
    • Text files (.txt)
    • XML files (.xml)

GitHub Authentication

For private repositories, you'll need a GitHub personal access token:

  1. Generate a token at https://github.com/settings/tokens
  2. Use the token with the --github-token option:
repo-to-singlefile https://github.com/owner/private-repo output.txt --github-token YOUR_TOKEN

Error Handling

The tool provides clear error messages for common issues:

  • Invalid repository paths or URLs
  • Missing subfolders
  • Permission denied errors
  • Binary file skipping
  • Token counting errors

Development

Setup Development Environment

  1. Clone the repository:
git clone https://github.com/yourusername/repo-to-singlefile.git
cd repo-to-singlefile
  1. Install dependencies:
pip install -e .

Running Tests

pytest

Common Issues

Permission Denied

When accessing private GitHub repositories, make sure your token has the necessary permissions:

  • For public repositories: No token needed
  • For private repositories: Token needs repo scope

Subfolder Not Found

When specifying a subfolder:

  • Ensure the path is relative to the repository root
  • Use forward slashes (/) even on Windows
  • Check that the subfolder exists in the repository

Large Repositories

For very large repositories:

  • Consider processing specific subfolders
  • Be aware of rate limits when using GitHub API
  • Monitor token costs for large codebases

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Push to the branch
  5. Create a pull request

License

This project is licensed under the MIT License

Contact

  • Report bugs through GitHub issues
  • Submit feature requests through GitHub issues
  • For security issues, please see SECURITY.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repo_to_singlefile-0.1.0.tar.gz (5.9 kB view details)

Uploaded Source

Built Distribution

repo_to_singlefile-0.1.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file repo_to_singlefile-0.1.0.tar.gz.

File metadata

  • Download URL: repo_to_singlefile-0.1.0.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.11.7 Darwin/23.1.0

File hashes

Hashes for repo_to_singlefile-0.1.0.tar.gz
Algorithm Hash digest
SHA256 61c2166571d2fac1baf1bd250e03a6743ff4f72cb9739f1e587637477872eb7b
MD5 ed3168417e81e6df1a65a250d58af3aa
BLAKE2b-256 b3c518828f1f4e5dfe17d6b8d32bca7d4c782a102b78a363cb129f261794afcf

See more details on using hashes here.

File details

Details for the file repo_to_singlefile-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for repo_to_singlefile-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 63653afb18cc2eda7074f5a74032e02a0646e1487e1e751dad0b3013c42d0f5b
MD5 b9529e4388c13808a77743abbf6f595d
BLAKE2b-256 c7f6b18e75bfc5c7d5eea1ed2811d186dfa9b911ca2cd6edde0396142d5c6c8c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page