Extract and format GitHub repository content for LLMs
Project description
gitin - GitHub Repository Content Extractor
A command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.
Features
- 🔍 Smart Extraction: Extract files based on patterns and content
- 📝 LLM-Optimized: Clean markdown output with token estimation
- 🚀 Progress Tracking: Visual progress bars for operations
- 🎯 Flexible Filtering: Include/exclude patterns and content search
- 📊 Size Control: File size limits and statistics
- 🔄 Version Control: Respects .gitignore rules
Installation
From PyPI (Recommended)
pip install gitin
From Source
git clone https://github.com/unclecode/gitin
cd gitin
pip install -e .
Quick Start
# Basic usage
gitin https://github.com/username/repo -o output.md
# Extract Python files only
gitin https://github.com/username/repo --include="*.py" -o python_files.md
# Search for specific content
gitin https://github.com/username/repo --search="TODO,FIXME" -o review.md
Advanced Usage
Code Review Workflow
# Extract implementation files, excluding tests
gitin https://github.com/username/repo \
--include="src/*.py" \
--exclude="test_*,*_test.py" \
--search="TODO,FIXME,HACK" \
-o code_review.md
Feature Analysis
# Find async/await usage
gitin https://github.com/username/repo \
--include="*.py,*.js" \
--search="async def,await" \
-o async_patterns.md
Documentation Extraction
# Get all documentation files
gitin https://github.com/username/repo \
--include="docs/*,*.md,*.rst" \
--exclude="**/tests/*" \
-o documentation.md
Command-Line Options
Options:
--version Show the version and exit.
--exclude TEXT Comma-separated glob patterns to exclude
Example: --exclude="test_*,*.tmp,docs/*"
--include TEXT Comma-separated glob patterns to include
Example: --include="*.py,src/*.js,lib/*.rb"
--search TEXT Comma-separated strings to search in content
Example: --search="TODO,FIXME,HACK"
--max-size INTEGER Maximum file size in bytes (default: 1MB)
-o, --output TEXT Output markdown file path [required]
--help Show this message and exit.
LLM Integration Tips
-
Context Window Management
- Check the token count in the summary
- Use
--max-sizeto limit file sizes - Use
--searchto focus on relevant sections
-
Effective Filtering
- Use
--includefor specific file types - Use
--excludeto remove noise - Combine with
--searchfor precise results
- Use
-
Best Practices
# Extract core functionality gitin https://github.com/org/repo \ --include="src/**/*.py" \ --exclude="**/*test*" \ --max-size=50000 \ -o core_logic.md
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for release history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gitin-0.1.0.tar.gz.
File metadata
- Download URL: gitin-0.1.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea4aa27d6d6afcec5ceb1e60fac237ed75331b88d6488366be93995ee1563783
|
|
| MD5 |
1f4a7e69eaa05b4d91ae02656e987859
|
|
| BLAKE2b-256 |
f3ab8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2
|
File details
Details for the file gitin-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gitin-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
055de2b7e911bb22b0053ebf5a76c889788ecb75bb1da7d2317e5480e2360886
|
|
| MD5 |
b676942736f9814c53a44de886a043ca
|
|
| BLAKE2b-256 |
d4f0a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f
|