Tool to summarize directories of code for prompting with LLMs
Project description
Code Summarization Tool
This tool generates a summary of a directory's contents, including a tree view of its subdirectories and files, and the contents of each file. It can optionally exclude files listed in a .gitignore file, exclude or include Docker files, or filter files based on their extensions.
Installation
This tool can be run directly from the command line without installation. Just ensure you have Python installed and can run Python scripts.
Otherwise, you can install this tool as a package using pip:
pip install summarizeGPT
Usage
To use this tool, run the following command:
python summarize_directory.py <directory_path> [--gitignore <gitignore_path>] [--include <file_extensions>] [--exclude <file_extensions>] [-d|--show_docker] [-o|--show_only_docker]
Or with the package installed:
SummarizeGPT <directory_path> [--gitignore <gitignore_path>] [--include <file_extensions>] [--exclude <file_extensions>] [-d|--show_docker] [-o|--show_only_docker]
From my expereince on Windows the command can be case insensitive, try summarizegpt for convenience, however on Linux it is SummarizeGPT. -MSH
Where:
<directory_path>
is the path to the directory to summarize.<gitignore_path>
is the path to the .gitignore file to use.<file_extensions>
is a comma-separated list of file extensions to include or exclude.--show_docker
includes Docker files.--show_only_docker
shows only Docker files.
Examples
Summarize the contents of the current directory:
python summarize_directory.py .
Summarize the contents of a directory and exclude files listed in a .gitignore file:
python summarize_directory.py /path/to/directory --gitignore /path/to/.gitignore
Summarize the contents of a directory and include only .py and .txt files:
python summarize_directory.py /path/to/directory --include py,txt
Exclude .xml and .js files from the summary:
python summarize_directory.py /path/to/directory --exclude xml,js
Summarize the contents of a directory and include Docker files:
python summarize_directory.py /path/to/directory -d
Summarize the contents of a directory and show only Docker files:
python summarize_directory.py /path/to/directory -o
Output
The tool generates a file called Context_for_ChatGPT.md
in the specified directory with the summary of the directory's contents. The file includes a tree view of the directory's subdirectories and files, as well as the contents of each file. The tool also outputs a summary of the generated markdown, including the total number of lines, characters, and bytes.
More Examples with AI Assistance
ChatGPT Prompt for Command:
Given this:
SummarizeGPT /path/to/directory --gitignore /path/to/.gitignore
Can you write me a prompt that implements these:
C:\dev\edge\plugins\storage_local
C:\dev\edge\.gitignore
Sure! Here's a prompt that implements those arguments:
python summarize_directory.py C:/dev/edge/plugins/storage_local --gitignore C:/dev/edge/.gitignore
This prompt summarizes the contents of the
C:/dev/edge/plugins/storage_local
directory and excludes files listed in theC:/dev/edge/.gitignore
file. The output will be saved in a file calledContext_for_ChatGPT.md
in theC:/dev/edge/plugins/storage_local
directory.-ChatGPT
Copilot Prompt for Command:
The pattern below is great if you have copilot, you litterly can build off of what is below and just simply add your conditions and provide an empty line. Copilot will do the rest.
# Given This:
SummarizeGPT /path/to/directory --gitignore /path/to/.gitignore
# Your Conditions
C:\dev\edge\plugins\inference_resource_alert
C:\dev\edge\.gitignore
SummarizeGPT C:/dev/edge/Edge/plugins/storage_local --gitignore C:\dev\edge\Edge\.gitignore
Just add your conditions in the code block above and provide an empty line. Copilot will do the rest.
Limitations
This tool does not interpret the contents of the files. It simply displays the file structure and raw contents. If you have large files or numerous files in your directory, the summary file can become quite large.
The tool also does not handle symbolic links, so it will not follow links to directories or files outside of the specified directory.
Contributions
Contributions are welcome! Feel free to submit a pull request if you've made an improvement or fixed a bug.
Certainly, here's a proposed update to your README that includes a section on future enhancements and a call for contributions:
Future Enhancements
TODO Could you assist with implementing the following?
Maybe an argument, that when used instead of including the raw content of each file, an AI could generate an ai-readable summary of the code. This could dramatically reduce the output file size for large directories or files and make the summary more useful for quickly understanding the purpose and function of each file in the directory.
Perhaps this is something that can be cowritten with open.ai or copilot.
For example: when using this tool, on this tool and the above Future Enhancement in a ChatGPT 4 chat prompt it yielded the below section. -MSH
Here's a conceptual approach for implementing your request:
-
Include a new argument in your argument parser named
--ai_summary
(or something similar) which, when used, will trigger generating AI-readable summaries instead of including the raw content of each file. -
In your
get_file_contents
function, add a condition to check if the--ai_summary
argument is enabled. If it is, instead of adding the raw content of the file tofile_contents
, you would call an AI model to generate a summary of the file content, and then append that summary tofile_contents
.
To implement AI code summarization, you can use OpenAI's GPT-3 or GPT-4 API. Keep in mind that there will be some limitations due to the length of the text you can input to the API (for GPT-3, the maximum token limit is 4096 as of my knowledge cutoff in September 2021), so you might need to extract meaningful parts of the code or truncate it to fit within this limit.
Here is a rough skeleton of how you could modify your script:
import os
import openai
# Load API key from environment variable
api_key = os.environ.get('OPENAI_API_KEY')
if api_key is None:
raise ValueError("OpenAI API key not found in environment variable.")
# Set the API key
openai.api_key = api_key
def summarize_code(code):
response = openai.Completion.create(
engine="text-davinci-003", # Or whichever engine you want to use
prompt=code,
max_tokens=100, # Adjust as needed
temperature=0.3
)
return response.choices[0].text.strip()
def get_file_contents(directory, gitignore_file=None, include_exts=None, exclude_exts=None, show_docker=False, show_only_docker=False, ai_summary=False):
# ... existing code ...
for file in files:
# ... existing code ...
if ai_summary:
# Generate AI summary instead of including raw content
contents = summarize_code(contents)
file_contents += f"## {file_path}\n\n```\n{remove_empty_lines(contents)}\n```\n\n"
# ... existing code ...
def main():
parser = argparse.ArgumentParser(description='Code summarization tool.')
# ... existing arguments ...
parser.add_argument('-ai', '--ai_summary', action='store_true', help='Generate AI summary of
Pretty cool, huh? Now someone just needs to implement it on a new branch and submit a pull request! -MSH
Contribute
Contributions are welcome! Feel free to submit a pull request if you've made an improvement or fixed a bug.
License
This project is licensed under the terms of the GPLv3 license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file summarizegpt-1.2.tar.gz
.
File metadata
- Download URL: summarizegpt-1.2.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c96fe00e6910fc36cd25a17d091cfb13e45938c8d21907d88b40ee4937de70d7 |
|
MD5 | 9056ed4feffad99213d7593b9be9d9e7 |
|
BLAKE2b-256 | f4574082cfa6f88e78e752e25e7c7005fb45afd62c049724c56b8d4adbd6543b |
File details
Details for the file SummarizeGPT-1.2-py3-none-any.whl
.
File metadata
- Download URL: SummarizeGPT-1.2-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.8.19
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b162a5fed28756bd3625241d155d01192782a530183e885d59679e11200066be |
|
MD5 | 7f57518fd7fc93a8319546792e46b71c |
|
BLAKE2b-256 | 58bf82291ae992eaa9e7a10dd0b4913a19874c4123b13ecccc106675e7b0a68b |