Skip to main content

Tool to summarize directories of code for prompting with LLMs

Project description

Test Summarize GPT

Code Summarization Tool

This tool generates a summary of a directory's contents, including a tree view of its subdirectories and files, and the contents of each file. It can optionally exclude files listed in a .gitignore file, exclude or include Docker files, or filter files based on their extensions.

Installation

This tool can be run directly from the command line without installation. Just ensure you have Python installed and can run Python scripts.

Otherwise, you can install this tool as a package using pip:

pip install summarizeGPT

Usage

To use this tool, run the following command:

python summarize_directory.py <directory_path> [--gitignore <gitignore_path>] [--include <file_extensions>] [--exclude <file_extensions>] [-d|--show_docker] [-o|--show_only_docker]

Or with the package installed:

SummarizeGPT <directory_path> [--gitignore <gitignore_path>] [--include <file_extensions>] [--exclude <file_extensions>] [-d|--show_docker] [-o|--show_only_docker]

From my expereince on Windows the command can be case insensitive, try summarizegpt for convenience, however on Linux it is SummarizeGPT. -MSH

Where:

  • <directory_path> is the path to the directory to summarize.
  • <gitignore_path> is the path to the .gitignore file to use.
  • <file_extensions> is a comma-separated list of file extensions to include or exclude.
  • --show_docker includes Docker files.
  • --show_only_docker shows only Docker files.

Examples

Summarize the contents of the current directory:

python summarize_directory.py .

Summarize the contents of a directory and exclude files listed in a .gitignore file:

python summarize_directory.py /path/to/directory --gitignore /path/to/.gitignore

Summarize the contents of a directory and include only .py and .txt files:

python summarize_directory.py /path/to/directory --include py,txt

Exclude .xml and .js files from the summary:

python summarize_directory.py /path/to/directory --exclude xml,js

Summarize the contents of a directory and include Docker files:

python summarize_directory.py /path/to/directory -d

Summarize the contents of a directory and show only Docker files:

python summarize_directory.py /path/to/directory -o

Output

The tool generates a file called Context_for_ChatGPT.md in the specified directory with the summary of the directory's contents. The file includes a tree view of the directory's subdirectories and files, as well as the contents of each file. The tool also outputs a summary of the generated markdown, including the total number of lines, characters, and bytes.

More Examples with AI Assistance

ChatGPT Prompt for Command:


Given this:
SummarizeGPT /path/to/directory --gitignore /path/to/.gitignore

Can you write me a prompt that implements these:
C:\dev\edge\plugins\storage_local
C:\dev\edge\.gitignore

Sure! Here's a prompt that implements those arguments:

python summarize_directory.py C:/dev/edge/plugins/storage_local --gitignore C:/dev/edge/.gitignore

This prompt summarizes the contents of the C:/dev/edge/plugins/storage_local directory and excludes files listed in the C:/dev/edge/.gitignore file. The output will be saved in a file called Context_for_ChatGPT.md in the C:/dev/edge/plugins/storage_local directory.

-ChatGPT

Copilot Prompt for Command:


The pattern below is great if you have copilot, you litterly can build off of what is below and just simply add your conditions and provide an empty line. Copilot will do the rest.

# Given This:
SummarizeGPT /path/to/directory --gitignore /path/to/.gitignore

# Your Conditions
C:\dev\edge\plugins\inference_resource_alert
C:\dev\edge\.gitignore

SummarizeGPT C:/dev/edge/Edge/plugins/storage_local --gitignore C:\dev\edge\Edge\.gitignore

Just add your conditions in the code block above and provide an empty line. Copilot will do the rest.

Limitations

This tool does not interpret the contents of the files. It simply displays the file structure and raw contents. If you have large files or numerous files in your directory, the summary file can become quite large.

The tool also does not handle symbolic links, so it will not follow links to directories or files outside of the specified directory.

Contributions

Contributions are welcome! Feel free to submit a pull request if you've made an improvement or fixed a bug.

Certainly, here's a proposed update to your README that includes a section on future enhancements and a call for contributions:

Future Enhancements

TODO Could you assist with implementing the following?

Maybe an argument, that when used instead of including the raw content of each file, an AI could generate an ai-readable summary of the code. This could dramatically reduce the output file size for large directories or files and make the summary more useful for quickly understanding the purpose and function of each file in the directory.

Perhaps this is something that can be cowritten with open.ai or copilot.

For example: when using this tool, on this tool and the above Future Enhancement in a ChatGPT 4 chat prompt it yielded the below section. -MSH


Here's a conceptual approach for implementing your request:

  1. Include a new argument in your argument parser named --ai_summary (or something similar) which, when used, will trigger generating AI-readable summaries instead of including the raw content of each file.

  2. In your get_file_contents function, add a condition to check if the --ai_summary argument is enabled. If it is, instead of adding the raw content of the file to file_contents, you would call an AI model to generate a summary of the file content, and then append that summary to file_contents.

To implement AI code summarization, you can use OpenAI's GPT-3 or GPT-4 API. Keep in mind that there will be some limitations due to the length of the text you can input to the API (for GPT-3, the maximum token limit is 4096 as of my knowledge cutoff in September 2021), so you might need to extract meaningful parts of the code or truncate it to fit within this limit.

Here is a rough skeleton of how you could modify your script:

import os
import openai

# Load API key from environment variable
api_key = os.environ.get('OPENAI_API_KEY')
if api_key is None:
    raise ValueError("OpenAI API key not found in environment variable.")

# Set the API key
openai.api_key = api_key

def summarize_code(code):
    response = openai.Completion.create(
        engine="text-davinci-003",  # Or whichever engine you want to use
        prompt=code,
        max_tokens=100,  # Adjust as needed
        temperature=0.3
    )
    return response.choices[0].text.strip()

def get_file_contents(directory, gitignore_file=None, include_exts=None, exclude_exts=None, show_docker=False, show_only_docker=False, ai_summary=False):
    # ... existing code ...

    for file in files:
        # ... existing code ...

        if ai_summary:
            # Generate AI summary instead of including raw content
            contents = summarize_code(contents)
        file_contents += f"## {file_path}\n\n```\n{remove_empty_lines(contents)}\n```\n\n"

    # ... existing code ...

def main():
    parser = argparse.ArgumentParser(description='Code summarization tool.')
    # ... existing arguments ...
    parser.add_argument('-ai', '--ai_summary', action='store_true', help='Generate AI summary of

Pretty cool, huh? Now someone just needs to implement it on a new branch and submit a pull request! -MSH

Contribute

Contributions are welcome! Feel free to submit a pull request if you've made an improvement or fixed a bug.

License

This project is licensed under the terms of the GPLv3 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarizegpt-1.2.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

SummarizeGPT-1.2-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file summarizegpt-1.2.tar.gz.

File metadata

  • Download URL: summarizegpt-1.2.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for summarizegpt-1.2.tar.gz
Algorithm Hash digest
SHA256 c96fe00e6910fc36cd25a17d091cfb13e45938c8d21907d88b40ee4937de70d7
MD5 9056ed4feffad99213d7593b9be9d9e7
BLAKE2b-256 f4574082cfa6f88e78e752e25e7c7005fb45afd62c049724c56b8d4adbd6543b

See more details on using hashes here.

File details

Details for the file SummarizeGPT-1.2-py3-none-any.whl.

File metadata

  • Download URL: SummarizeGPT-1.2-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.19

File hashes

Hashes for SummarizeGPT-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b162a5fed28756bd3625241d155d01192782a530183e885d59679e11200066be
MD5 7f57518fd7fc93a8319546792e46b71c
BLAKE2b-256 58bf82291ae992eaa9e7a10dd0b4913a19874c4123b13ecccc106675e7b0a68b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page