A tool to extract GitHub repositories into a single file
Project description
GitHub Repository Extractor
This Python script allows you to extract the contents of a GitHub repository into a single text file. It's particularly useful for encapsulating an entire codebase into a single file, facilitating its use with Large Language Models (LLMs) that have high-capacity context memory.
Features
- Support for both local and remote GitHub repositories
- Flexible ignore and include lists for files, folders, and extensions
- Progress bar to track extraction process
- Handles binary files
- Option to clone remote repositories temporarily
- Ideal for preparing codebases for analysis by LLMs
Dependencies
This project requires the following Python packages:
pygithub
: For interacting with the GitHub APItqdm
: For displaying progress barsgitpython
: For handling Git operations
You can install these dependencies using pip:
pip install pygithub tqdm gitpython
Installation
- Clone this repository:
git clone https://github.com/yourusername/github-repo-extractor.git
- Install the required dependencies:
pip install -r requirements.txt
Usage
- Import the
GitHubRepoExtractor
class from the script. - Create an instance of
GitHubRepoExtractor
with your repository details. - Set ignore and include lists as needed.
- Call the
extract_to_file()
method to start the extraction process.
Example:
from github_repo_extractor import GitHubRepoExtractor
extractor = GitHubRepoExtractor(
repo_input='https://github.com/username/repo.git',
access_token='your_github_token'
)
extractor.set_ignore_list(
files=['.gitignore'],
folders=['tests', '.github'],
extensions=['.log']
)
extractor.set_include_list(
files=['README.md'],
extensions=['.py'],
exclusive=True
)
extractor.extract_to_file('output.txt')
Authentication
For optimal usage of this script, instead of prompting for the GitHub authentication token every time, you can use a centralized and easily integratable solution like keyvault. We recommend using the keyvault library available at https://github.com/ltoscano/keyvault.
This approach provides a more secure and centralized way to manage your GitHub token.
Use Case: Preparing Codebases for LLMs
This tool is particularly valuable when working with Large Language Models (LLMs) that have high-capacity context memory. By encapsulating an entire codebase into a single file, you can:
- Easily feed the entire codebase into an LLM for analysis, code review, or understanding.
- Maintain context across multiple files and directories when discussing code with an LLM.
- Simplify the process of asking LLMs to perform tasks that require understanding of the entire project structure.
This approach allows for more comprehensive and context-aware interactions with LLMs when working with large software projects.
Contributing
Contributions are welcome! Please see the CONTRIBUTING.md file for guidelines on how to contribute to this project.
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file github_repo_extractor-0.1.0.tar.gz
.
File metadata
- Download URL: github_repo_extractor-0.1.0.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7cf09ebcf634ff6c629afcb46161f20471219539223c2d97daffb2a53dd0b475 |
|
MD5 | 6f19f6a3016748292cbafcbf43185039 |
|
BLAKE2b-256 | 6aa402f1e922464708b40a4023f00bb55462148dbc79b2eef8bfc70d5bfe6c88 |
File details
Details for the file github_repo_extractor-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: github_repo_extractor-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6aedcfb3cf515ab60cd7dac70829d607d3a0632a9f44677494c2685aa409f4b |
|
MD5 | 475e9ef411bb7aecf5fdc71ae49d208c |
|
BLAKE2b-256 | 701ef86e73b62f4efe3f0a6f0c2f3cfdc869cfb024bcaec377e947df8b59d5c4 |