A tool to convert code repositories into text format for LLM context
Project description
Repo to Single File
A command-line tool that converts code repositories into text format, making them suitable for use as context in Large Language Models (LLMs). Supports both local repositories and GitHub remote repositories.
Features
- Convert local Git repositories to text format
- Convert GitHub repositories to text format (public and private)
- Process specific subfolders in monorepos
- Respect
.gitignore
patterns for local repositories - Skip binary files automatically
- Structured output with clear file demarcation
- Token counting with OpenAI tokenizer
- Cost estimation for GPT-3.5 and GPT-4
Installation
pip install repo-to-singlefile
Usage
Basic Usage
- Convert a local repository:
repo-to-singlefile /path/to/local/repo output.txt
- Convert a public GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt
- Convert a private GitHub repository:
repo-to-singlefile https://github.com/owner/repo output.txt --github-token YOUR_GITHUB_TOKEN
Monorepo Support
Process only specific subfolders in a repository:
- Local monorepo:
repo-to-singlefile /path/to/repo output.txt --subfolder packages/mylib
- GitHub monorepo:
repo-to-singlefile https://github.com/owner/repo output.txt --subfolder packages/mylib
Output Format
The generated text file contains the contents of all text files in the repository, with clear headers separating each file:
### File: src/main.py ###
[content of main.py]
### File: src/utils.py ###
[content of utils.py]
...
After processing, you'll see a summary that includes:
- Total token count
- Total character count
- Estimated costs for GPT-3.5 and GPT-4 usage
Example summary:
==================================================
CONVERSION SUMMARY
==================================================
Total tokens: 15,234
Total characters: 45,678
Estimated costs (based on current OpenAI pricing):
GPT-4:
- Input cost: $0.46
- Output cost: $0.91
GPT-3.5:
- Input cost: $0.02
- Output cost: $0.03
==================================================
Configuration
The tool automatically:
- Respects
.gitignore
patterns in local repositories - Skips binary files
- Processes common text file extensions:
- Python (.py)
- JavaScript (.js)
- Java (.java)
- C++ (.cpp, .h)
- Web (.html, .css)
- Documentation (.md)
- Config files (.yml, .yaml, .json)
- Shell scripts (.sh)
- Text files (.txt)
- XML files (.xml)
GitHub Authentication
For private repositories, you'll need a GitHub personal access token:
- Generate a token at https://github.com/settings/tokens
- Use the token with the --github-token option:
repo-to-singlefile https://github.com/owner/private-repo output.txt --github-token YOUR_TOKEN
Error Handling
The tool provides clear error messages for common issues:
- Invalid repository paths or URLs
- Missing subfolders
- Permission denied errors
- Binary file skipping
- Token counting errors
Development
Setup Development Environment
- Clone the repository:
git clone https://github.com/yourusername/repo-to-singlefile.git
cd repo-to-singlefile
- Install dependencies:
pip install -e .
Running Tests
pytest
Common Issues
Permission Denied
When accessing private GitHub repositories, make sure your token has the necessary permissions:
- For public repositories: No token needed
- For private repositories: Token needs
repo
scope
Subfolder Not Found
When specifying a subfolder:
- Ensure the path is relative to the repository root
- Use forward slashes (/) even on Windows
- Check that the subfolder exists in the repository
Large Repositories
For very large repositories:
- Consider processing specific subfolders
- Be aware of rate limits when using GitHub API
- Monitor token costs for large codebases
Contributing
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a pull request
License
This project is licensed under the MIT License
Contact
- Report bugs through GitHub issues
- Submit feature requests through GitHub issues
- For security issues, please see SECURITY.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file repo_to_singlefile-0.1.0.tar.gz
.
File metadata
- Download URL: repo_to_singlefile-0.1.0.tar.gz
- Upload date:
- Size: 5.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.7 Darwin/23.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 61c2166571d2fac1baf1bd250e03a6743ff4f72cb9739f1e587637477872eb7b |
|
MD5 | ed3168417e81e6df1a65a250d58af3aa |
|
BLAKE2b-256 | b3c518828f1f4e5dfe17d6b8d32bca7d4c782a102b78a363cb129f261794afcf |
File details
Details for the file repo_to_singlefile-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: repo_to_singlefile-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.4 CPython/3.11.7 Darwin/23.1.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63653afb18cc2eda7074f5a74032e02a0646e1487e1e751dad0b3013c42d0f5b |
|
MD5 | b9529e4388c13808a77743abbf6f595d |
|
BLAKE2b-256 | c7f6b18e75bfc5c7d5eea1ed2811d186dfa9b911ca2cd6edde0396142d5c6c8c |