Consolidates and analyzes codebases for insights.
Project description
Codebase Digest
Codebase Digest is a command-line tool written in Python that helps you analyze and understand your codebase. It provides a structured overview of your project's directory structure, file sizes, token counts, and even consolidates the content of all text-based files into a single output for easy analysis with Large Language Models (LLMs).
Features
- Directory Tree Visualization: Generates a hierarchical tree view of your project's directory structure, optionally including file sizes and ignoring specified files/directories.
- Codebase Statistics: Calculates total files, directories, code size, and token counts for your project.
- File Content Consolidation: Consolidates the content of all text-based files into a single output file (useful for LLM analysis).
- .gitignore Support: Respects your .gitignore file to exclude unwanted files and directories from the analysis.
- Customizable Output: Choose between text or JSON output formats and customize the level of detail included.
- Colored Console Output: Provides a visually appealing and informative summary in the console..
Installation
Option 1: Install via pip (Recommended)
pip install codebase-digest
Option 2: Clone the repository
git clone https://github.com/kamilstanuch/codebase-digest.git
cd codebase-digest
pip install -r requirements.txt
Usage
cdigest [path_to_directory] [options]
Options
path_to_directory: Path to the directory you want to analyze.
-d, --max-depth: Maximum depth for directory traversal.
-o, --output: Output format (text or json). Default: text.
-f, --file: Output file name (default: codebase_analysis.txt or codebase_analysis.json).
--show-tree: Show directory tree in console output (always included in text file output).
--show-size: Show file sizes in directory tree.
--show-ignored: Show ignored files and directories in tree.
--ignore-ext: Additional file extensions to ignore (e.g., .pyc .log).
--no-content: Exclude file contents from the output.
--include-git: Include .git directory in the analysis.
--max-size: Maximum allowed text content size in KB (default: 10240 KB).
--copy-to-clipboard: Copy the output to clipboard.
cdigest my_project -d 3 -o json --show-size --ignore-ext .pyc .log
LLM Prompts for Enhanced Codebase Analysis
We've prepared a set of prompts to help you analyze and improve your codebase using Large Language Models. These prompts are stored in the prompt_library
directory for easy access and management.
You can use these prompts with various LLM interfaces:
- Directly in the Cursor.sh IDE for an integrated development experience.
- With Gemini models for larger codebases (up to 2,097,152 tokens).
- In any other LLM interface of your choice.
Use Cases
- Codebase Mapping and Learning: Quickly understand the structure and functionality of a new or complex codebase.
- Improving User Stories: Analyze existing code to refine or generate user stories, ensuring better alignment between code and business requirements.
- Initial Security Analysis: Perform a preliminary security assessment to identify potential vulnerabilities.
- Code Quality Enhancement: Identify areas for improvement in code quality, readability, and maintainability.
- Documentation Generation: Automatically generate or improve codebase documentation.
- Learning Tool: Use as a teaching aid to explain complex coding concepts or architectures.
Prompt Categories
I. Code Quality & Understanding:
- Codebase Error and Inconsistency Analysis
- Codebase Risk Assessment
- Codebase Documentation Generation
II. Learning & Knowledge Extraction:
III. Code Improvement & Transformation:
- Codebase Best Practice Analysis
- Codebase Translation to Another Programming Language
- Codebase Refactoring for Improved Readability and Performance
IV. Testing & Security:
For detailed instructions on how to use these prompts, please refer to the individual files in the prompt_library
directory.
Code Analysis Prompt Library
This repository contains a collection of prompts designed to assist in various aspects of code analysis, improvement, and documentation.
Available Prompts
-
User Story Reconstruction (
learning_user_story_reconstruction.md
)- Reconstruct and structure user stories based on the provided codebase.
-
Risk Assessment (
quality_risk_assessment.md
)- Identify potential risks within the codebase.
-
Best Practice Analysis (
improvement_best_practice_analysis.md
)- Analyze the codebase for good and bad programming practices.
-
Language Translation (
improvement_language_translation.md
)- Translate the codebase from one programming language to another.
-
Refactoring (
improvement_refactoring.md
)- Suggest refactoring improvements for better readability and performance.
-
Unit Test Generation (
testing_unit_test_generation.md
)- Generate unit tests for the provided codebase.
-
Business Impact Analysis (
business_impact_analysis.md
)- Analyze the codebase to identify key features and their potential business impact.
-
Stakeholder Persona Generation (
stakeholder_persona_generation.md
)- Infer potential stakeholder personas based on the functionalities present in the codebase.
Usage
To use these prompts, select the appropriate prompt file for your analysis needs and follow the instructions provided within each prompt.
Contributing
Contributions to expand and improve this prompt library are welcome. Please submit a pull request with your suggested changes or additions.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for codebase_digest-0.1.33-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0ed91f471d2106ec778ea53ff7559aec0d465712955b7e41dad23b0699736c24 |
|
MD5 | fae3147f3e47392c85185834fdac5c55 |
|
BLAKE2b-256 | 30fbd885663b3443e9f6c9a051028b3bb1f44a82a766ccb6c13e8cc100e5fd83 |