Easily copy all relevant source files in a repository to clipboard

Project description

repogather

repogather is a command-line tool that copies all relevant files (with their relative paths) in a repository to the clipboard. It is intended to be used in LLM code understanding or code generation workflows. It uses gpt-4o-mini (configurable) to decide file relevance, but can also be used without an LLM to return all files, with non-AI filters (such as excluding tests or config files).

Features

Filters and analyzes code files in a repository
Excludes test and configuration files by default (with options to include them)
Filters out common ecosystem-specific directories and files (e.g., node_modules, venv)
Respects .gitignore rules (with option to include ignored files)
Handles repositories of any size by splitting content into multiple requests when necessary
Estimates token count and API usage cost before processing
Uses OpenAI's GPT models to evaluate file relevance
Supports various methods of providing the OpenAI API key
Copies relevant files and their contents to the clipboard
Can return all files without LLM analysis
Allows custom exclusion of files or directories

Installation

Install repogather using pip:

pip install repogather

Setup

Set up your OpenAI API key using one of the following methods:

As an environment variable: export OPENAI_API_KEY=your_api_key_here
In a .env file in your current working directory:
```
OPENAI_API_KEY=your_api_key_here
```
Provide it as a command-line argument when running the tool (see Usage section)

Usage

After installation, you can run repogather from the command line:

repogather [QUERY] [OPTIONS]

Options

--include-test: Include test files in the analysis
--include-config: Include configuration files in the analysis
--include-ecosystem: Include ecosystem-specific files and directories (e.g., node_modules, venv)
--include-gitignored: Include files that are gitignored
--exclude PATTERN: Exclude files containing the specified path fragment (can be used multiple times)
--relevance-threshold THRESHOLD: Set the relevance threshold (0-100, default: 50)
--model MODEL: Specify the OpenAI model to use (default: gpt-4o-mini-2024-07-18)
--openai-key KEY: Provide the OpenAI API key directly
--all: Return all files without using LLM analysis

Examples

Analyze files with a query:
```
repogather "Find files related to user authentication" --include-config --relevance-threshold 70 --model gpt-4o-2024-08-06
```
This command will:
1. Search for files related to user authentication
2. Include configuration files in the search
3. Only return files with a relevance score of 70 or higher
4. Use the GPT-4o model from August 2024 for analysis
Return all files without LLM analysis, including ecosystem files but excluding a specific directory:
```
repogather --all --include-test --include-config --include-ecosystem --include-gitignored --exclude "legacy_code"
```
This command will:
1. Gather all code files in the repository
2. Include test, config, and ecosystem-specific files in the output
3. Include files that would normally be ignored by .gitignore
4. Exclude any files or directories containing "legacy_code" in their path
5. Copy all gathered files to the clipboard without using LLM analysis

How It Works

repogather performs the following steps:

Scans the current directory and its subdirectories for code files
Filters out test, configuration, ecosystem-specific, and gitignored files (unless included via options)
Applies any custom exclusion patterns
If --all option is used, returns all filtered files
Otherwise: a. Counts the tokens in the filtered files and estimates the API usage cost b. Displays information about large files (>30,000 tokens) and directories (>100,000 tokens) c. Asks for user confirmation before proceeding d. If the total tokens exceed the model's limit, splits the content into multiple requests e. Sends the file contents and the query to the specified OpenAI model f. Processes the model's response to rank files by relevance g. Filters the files by the specified relevance threshold
Copies the relevant file paths and contents to the clipboard

Note

repogather requires an active OpenAI API key when using LLM analysis. It will prompt you to confirm the expected cost of the query (in input tokens) before proceeding. When using the --all option, no API key is required.

repogather handles repositories of any size by splitting the content into multiple requests when necessary. This allows for analysis of large codebases without hitting API token limits.

Project details

Release history Release notifications | RSS feed

0.0.4

Sep 9, 2024

This version

0.0.3

Sep 8, 2024

0.0.2

Sep 8, 2024

0.0.1

Sep 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

repogather-0.0.3.tar.gz (12.0 kB view details)

Uploaded Sep 8, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

repogather-0.0.3-py3-none-any.whl (13.9 kB view details)

Uploaded Sep 8, 2024 Python 3

File details

Details for the file repogather-0.0.3.tar.gz.

File metadata

Download URL: repogather-0.0.3.tar.gz
Upload date: Sep 8, 2024
Size: 12.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.5

File hashes

Hashes for repogather-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`60c3932beeee5c0c756a73a513d881fab88b020defb2463436b52f73d8c6a319`
MD5	`c43c42de1ed2ba5e8317c582cca1f2e3`
BLAKE2b-256	`38aaeb024193f7ded17bbc361b8d87e08b766c5be6a067d4c7b1adde702d61fd`

See more details on using hashes here.

File details

Details for the file repogather-0.0.3-py3-none-any.whl.

File metadata

Download URL: repogather-0.0.3-py3-none-any.whl
Upload date: Sep 8, 2024
Size: 13.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.10.5

File hashes

Hashes for repogather-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`59e14e925d3507a7d285f9d264feb9b0b9e6b7e97b27c8d8a6fa7e066c5393f3`
MD5	`e3f3aa6db69bcd4cc893b52d0ba71076`
BLAKE2b-256	`edcc1f9f0a32a1f15d407594b506bf97a1adc403952bbeec62219851b397503b`

See more details on using hashes here.

repogather 0.0.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

repogather

Features

Installation

Setup

Usage

Options

Examples

How It Works

Note

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes