Skip to main content

Consolidates and analyzes codebases for insights.

Project description

Codebase Digest

Codebase Digest is a command-line tool written in Python that helps you analyze and understand your codebase. It provides a structured overview of your project's directory structure, file sizes, token counts, and even consolidates the content of all text-based files into a single output for easy analysis with Large Language Models (LLMs).

Features

  • Directory Tree Visualization: Generates a hierarchical tree view of your project's directory structure, optionally including file sizes and ignoring specified files/directories.
  • Codebase Statistics: Calculates total files, directories, code size, and token counts for your project.
  • File Content Consolidation: Consolidates the content of all text-based files into a single output file (useful for LLM analysis).
  • .gitignore Support: Respects your .gitignore file to exclude unwanted files and directories from the analysis.
  • Customizable Output: Choose between text or JSON output formats and customize the level of detail included.
  • Colored Console Output: Provides a visually appealing and informative summary in the console..

Installation

Option 1: Install via pip (Recommended)

pip install codebase-digest

Option 2: Clone the repository

git clone https://github.com/kamilstanuch/codebase-digest.git
cd codebase-digest
pip install -r requirements.txt

Usage

cdigest [path_to_directory] [options]

Options

path_to_directory: Path to the directory you want to analyze.
-d, --max-depth: Maximum depth for directory traversal.
-o, --output: Output format (text or json). Default: text.
-f, --file: Output file name (default: codebase_analysis.txt or codebase_analysis.json).
--show-tree: Show directory tree in console output (always included in text file output).
--show-size: Show file sizes in directory tree.
--show-ignored: Show ignored files and directories in tree.
--ignore-ext: Additional file extensions to ignore (e.g., .pyc .log).
--no-content: Exclude file contents from the output.
--include-git: Include .git directory in the analysis.
--max-size: Maximum allowed text content size in KB (default: 10240 KB).
--copy-to-clipboard: Copy the output to clipboard.
cdigest my_project -d 3 -o json --show-size --ignore-ext .pyc .log

LLM Prompts for Enhanced Codebase Analysis

We've prepared a set of prompts to help you analyze and improve your codebase using Large Language Models. These prompts are stored in the prompt_library directory for easy access and management.

You can use these prompts with various LLM interfaces:

  • Directly in the Cursor.sh IDE for an integrated development experience.
  • With Gemini models for larger codebases (up to 2,097,152 tokens).
  • In any other LLM interface of your choice.

Use Cases

  1. Codebase Mapping and Learning: Quickly understand the structure and functionality of a new or complex codebase.
  2. Improving User Stories: Analyze existing code to refine or generate user stories, ensuring better alignment between code and business requirements.
  3. Initial Security Analysis: Perform a preliminary security assessment to identify potential vulnerabilities.
  4. Code Quality Enhancement: Identify areas for improvement in code quality, readability, and maintainability.
  5. Documentation Generation: Automatically generate or improve codebase documentation.
  6. Learning Tool: Use as a teaching aid to explain complex coding concepts or architectures.
  7. Business Alignment: Analyze how the codebase supports business objectives and identify potential improvements.
  8. Stakeholder Communication: Generate insights to facilitate discussions with non-technical stakeholders.

Prompt Categories

I. Code Quality & Understanding:

II. Learning & Knowledge Extraction:

III. Code Improvement & Transformation:

IV. Testing & Security:

V. Business & Stakeholder Analysis:

For detailed instructions on how to use these prompts, please refer to the individual files in the prompt_library directory.

Conclusion

This collection of prompts covers a wide range of analysis techniques, from technical code quality assessments to business-oriented evaluations. By using these prompts with Large Language Models, developers and stakeholders can gain valuable insights into their codebase and its alignment with business objectives.

For detailed instructions on how to use each prompt, please refer to the individual files in the prompt_library directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codebase_digest-0.1.38.tar.gz (8.6 kB view hashes)

Uploaded Source

Built Distribution

codebase_digest-0.1.38-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page