Skip to main content

Extract and format GitHub repository content for LLMs

Project description

gitin - GitHub Repository Content Extractor

PyPI version License: MIT Python Versions

A command-line tool designed to extract and format GitHub repository content for effective use with Large Language Models (LLMs). Perfect for providing codebases as context in LLM conversations.

Features

  • 🔍 Smart Extraction: Extract files based on patterns and content
  • 📝 LLM-Optimized: Clean markdown output with token estimation
  • 🚀 Progress Tracking: Visual progress bars for operations
  • 🎯 Flexible Filtering: Include/exclude patterns and content search
  • 📊 Size Control: File size limits and statistics
  • 🔄 Version Control: Respects .gitignore rules

Installation

From PyPI (Recommended)

pip install gitin

From Source

git clone https://github.com/unclecode/gitin
cd gitin
pip install -e .

Quick Start

# Basic usage
gitin https://github.com/username/repo -o output.md

# Extract Python files only
gitin https://github.com/username/repo --include="*.py" -o python_files.md

# Search for specific content
gitin https://github.com/username/repo --search="TODO,FIXME" -o review.md

Advanced Usage

Code Review Workflow

# Extract implementation files, excluding tests
gitin https://github.com/username/repo \
  --include="src/*.py" \
  --exclude="test_*,*_test.py" \
  --search="TODO,FIXME,HACK" \
  -o code_review.md

Feature Analysis

# Find async/await usage
gitin https://github.com/username/repo \
  --include="*.py,*.js" \
  --search="async def,await" \
  -o async_patterns.md

Documentation Extraction

# Get all documentation files
gitin https://github.com/username/repo \
  --include="docs/*,*.md,*.rst" \
  --exclude="**/tests/*" \
  -o documentation.md

Command-Line Options

Options:
  --version           Show the version and exit.
  --exclude TEXT      Comma-separated glob patterns to exclude
                     Example: --exclude="test_*,*.tmp,docs/*"
  --include TEXT      Comma-separated glob patterns to include
                     Example: --include="*.py,src/*.js,lib/*.rb"
  --search TEXT       Comma-separated strings to search in content
                     Example: --search="TODO,FIXME,HACK"
  --max-size INTEGER  Maximum file size in bytes (default: 1MB)
  -o, --output TEXT   Output markdown file path [required]
  --help             Show this message and exit.

LLM Integration Tips

  1. Context Window Management

    • Check the token count in the summary
    • Use --max-size to limit file sizes
    • Use --search to focus on relevant sections
  2. Effective Filtering

    • Use --include for specific file types
    • Use --exclude to remove noise
    • Combine with --search for precise results
  3. Best Practices

    # Extract core functionality
    gitin https://github.com/org/repo \
      --include="src/**/*.py" \
      --exclude="**/*test*" \
      --max-size=50000 \
      -o core_logic.md
    

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for release history.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gitin-0.1.0.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gitin-0.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file gitin-0.1.0.tar.gz.

File metadata

  • Download URL: gitin-0.1.0.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for gitin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 ea4aa27d6d6afcec5ceb1e60fac237ed75331b88d6488366be93995ee1563783
MD5 1f4a7e69eaa05b4d91ae02656e987859
BLAKE2b-256 f3ab8dc1a2d9ac9be8633ec09966a476203632646104ab58016122506ba33ff2

See more details on using hashes here.

File details

Details for the file gitin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gitin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.13

File hashes

Hashes for gitin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 055de2b7e911bb22b0053ebf5a76c889788ecb75bb1da7d2317e5480e2360886
MD5 b676942736f9814c53a44de886a043ca
BLAKE2b-256 d4f0a964d978a11d3b7be3d78762c2750198dbc1579354d901cd8c450a18c20f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page