Consolidates and analyzes codebases for insights.
Project description
Codebase Digest
Codebase Digest is a command-line tool written in Python that helps you analyze and understand your codebase. It provides a structured overview of your project's directory structure, file sizes, token counts, and even consolidates the content of all text-based files into a single output for easy analysis with Large Language Models (LLMs).
Features
- Directory Tree Visualization: Generates a hierarchical tree view of your project's directory structure, optionally including file sizes and ignoring specified files/directories.
- Codebase Statistics: Calculates total files, directories, code size, and token counts for your project.
- File Content Consolidation: Consolidates the content of all text-based files into a single output file (useful for LLM analysis).
- .gitignore Support: Respects your .gitignore file to exclude unwanted files and directories from the analysis.
- Customizable Output: Choose between text or JSON output formats and customize the level of detail included.
- Colored Console Output: Provides a visually appealing and informative summary in the console..
Installation
Option 1: Install via pip (Recommended)
pip install codebase-digest
Option 2: Clone the repository
git clone https://github.com/kamilstanuch/codebase-digest.git
cd codebase-digest
pip install -r requirements.txt
Usage
cdigest [path_to_directory] [options]
Options
path_to_directory: Path to the directory you want to analyze.
-d, --max-depth: Maximum depth for directory traversal.
-o, --output: Output format (text or json). Default: text.
-f, --file: Output file name (default: codebase_analysis.txt or codebase_analysis.json).
--show-tree: Show directory tree in console output (always included in text file output).
--show-size: Show file sizes in directory tree.
--show-ignored: Show ignored files and directories in tree.
--ignore-ext: Additional file extensions to ignore (e.g., .pyc .log).
--no-content: Exclude file contents from the output.
--include-git: Include .git directory in the analysis.
--max-size: Maximum allowed text content size in KB (default: 10240 KB).
--copy-to-clipboard: Copy the output to clipboard.
cdigest my_project -d 3 -o json --show-size --ignore-ext .pyc .log
LLM Prompts for Enhanced Codebase Analysis
We've prepared a set of prompts to help you analyze and improve your codebase using Large Language Models. These prompts are stored in the prompt_library
directory for easy access and management.
You can use these prompts with various LLM interfaces:
- Directly in the Cursor.sh IDE for an integrated development experience.
- With Gemini models for larger codebases (up to 2,097,152 tokens).
- In any other LLM interface of your choice.
Use Cases
- Codebase Mapping and Learning: Quickly understand the structure and functionality of a new or complex codebase.
- Improving User Stories: Analyze existing code to refine or generate user stories, ensuring better alignment between code and business requirements.
- Initial Security Analysis: Perform a preliminary security assessment to identify potential vulnerabilities.
- Code Quality Enhancement: Identify areas for improvement in code quality, readability, and maintainability.
- Documentation Generation: Automatically generate or improve codebase documentation.
- Learning Tool: Use as a teaching aid to explain complex coding concepts or architectures.
- Business Alignment: Analyze how the codebase supports business objectives and identify potential improvements.
- Stakeholder Communication: Generate insights to facilitate discussions with non-technical stakeholders.
Prompt Categories
I. Code Quality & Understanding:
- Codebase Error and Inconsistency Analysis: Identify and analyze errors and inconsistencies in the codebase.
- Codebase Risk Assessment: Evaluate potential risks within the codebase.
- Codebase Documentation Generation: Automatically generate or improve codebase documentation.
II. Learning & Knowledge Extraction:
- User Story Reconstruction from Code: Reconstruct and structure user stories based on the codebase.
- Code-Based Mini-Lesson Generation: Create mini-lessons to explain complex coding concepts or architectures.
III. Code Improvement & Transformation:
- Codebase Best Practice Analysis: Analyze the codebase for good and bad programming practices.
- Codebase Translation to Another Programming Language: Translate the codebase from one programming language to another.
- Codebase Refactoring for Improved Readability and Performance: Suggest refactoring improvements for better readability and performance.
IV. Testing & Security:
- Unit Test Generation for Codebase: Generate unit tests for the provided codebase.
- Security Vulnerability Analysis of Codebase: Identify potential security vulnerabilities in the codebase.
V. Business & Stakeholder Analysis:
- Business Impact Analysis: Identify key features and their potential business impact.
- Stakeholder Persona Generation: Infer potential stakeholder personas based on codebase functionalities.
- Business Model Canvas Analysis: Analyze business implications using the Business Model Canvas framework.
- Value Proposition Canvas Analysis: Align technical features with user needs and benefits.
- SWOT Analysis: Evaluate the codebase's current state and future potential.
- Jobs to be Done (JTBD) Analysis: Understand core user needs and identify potential improvements.
- Lean Canvas Analysis: Evaluate business potential and identify areas for improvement or pivot.
- OKR (Objectives and Key Results) Analysis: Align codebase features with potential business objectives and key results.
- Customer Journey Map Analysis: Map how different parts support various stages of the user's journey.
- Value Chain Analysis: Understand how the codebase supports the larger value creation process.
For detailed instructions on how to use these prompts, please refer to the individual files in the prompt_library
directory.
Conclusion
This collection of prompts covers a wide range of analysis techniques, from technical code quality assessments to business-oriented evaluations. By using these prompts with Large Language Models, developers and stakeholders can gain valuable insights into their codebase and its alignment with business objectives.
For detailed instructions on how to use each prompt, please refer to the individual files in the prompt_library
directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for codebase_digest-0.1.38-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f7b036d3620f4a2d828ba314f83dca74da1aa46f18f9d4e56571e38f1a6f1bb |
|
MD5 | 06252b87d5cc29385522671de7531887 |
|
BLAKE2b-256 | bd1f086a240b410ac5efde4e6a9c6351dabf5f03c105333dfc47552a634da333 |