Consolidates and analyzes codebases for insights.
Project description
Codebase Digest
Codebase Digest is a command-line tool written in Python that helps you analyze and understand your codebase. It provides a structured overview of your project's directory structure, file sizes, token counts, and even consolidates the content of all text-based files into a single output for easy analysis with Large Language Models (LLMs).
Features
- Directory Tree Visualization: Generates a hierarchical tree view of your project's directory structure, optionally including file sizes and ignoring specified files/directories.
- Codebase Statistics: Calculates total files, directories, code size, and token counts for your project.
- File Content Consolidation: Consolidates the content of all text-based files into a single output file (useful for LLM analysis).
- .gitignore Support: Respects your .gitignore file to exclude unwanted files and directories from the analysis.
- Customizable Output: Choose between text or JSON output formats and customize the level of detail included.
- Colored Console Output: Provides a visually appealing and informative summary in the console..
Installation
Option 1: Install via pip (Recommended)
pip install codebase-digest
Option 2: Clone the repository
git clone https://github.com/kamilstanuch/codebase-digest.git
cd codebase-digest
pip install -r requirements.txt
Usage
cdigest [path_to_directory] [options]
Options
path_to_directory: Path to the directory you want to analyze.
-d, --max-depth: Maximum depth for directory traversal.
-o, --output: Output format (text or json). Default: text.
-f, --file: Output file name (default: codebase_analysis.txt or codebase_analysis.json).
--show-tree: Show directory tree in console output (always included in text file output).
--show-size: Show file sizes in directory tree.
--show-ignored: Show ignored files and directories in tree.
--ignore-ext: Additional file extensions to ignore (e.g., .pyc .log).
--no-content: Exclude file contents from the output.
--include-git: Include .git directory in the analysis.
--max-size: Maximum allowed text content size in KB (default: 10240 KB).
cdigest my_project -d 3 -o json --show-size --ignore-ext .pyc .log
LLM Prompts for Enhanced Codebase Analysis
We've prepared a set of prompts to help you analyze and improve your codebase using Large Language Models. These prompts are stored in the prompt_library
directory for easy access and management.
You can use these prompts with various LLM interfaces:
- Directly in the Cursor.sh IDE for an integrated development experience.
- With Gemini models for larger codebases (up to 2,097,152 tokens).
- In any other LLM interface of your choice.
Use Cases
- Codebase Mapping and Learning: Quickly understand the structure and functionality of a new or complex codebase.
- Improving User Stories: Analyze existing code to refine or generate user stories, ensuring better alignment between code and business requirements.
- Initial Security Analysis: Perform a preliminary security assessment to identify potential vulnerabilities.
- Code Quality Enhancement: Identify areas for improvement in code quality, readability, and maintainability.
- Documentation Generation: Automatically generate or improve codebase documentation.
- Learning Tool: Use as a teaching aid to explain complex coding concepts or architectures.
Prompt Categories
I. Code Quality & Understanding:
- Codebase Error and Inconsistency Analysis
- Codebase Risk Assessment
- Codebase Documentation Generation
II. Learning & Knowledge Extraction:
III. Code Improvement & Transformation:
- Codebase Best Practice Analysis
- Codebase Translation to Another Programming Language
- Codebase Refactoring for Improved Readability and Performance
IV. Testing & Security:
For detailed instructions on how to use these prompts, please refer to the individual files in the prompt_library
directory.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for codebase_digest-0.1.30-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fc2258ac1a28e61d58234acb257105f2d09567b73aa86c2baf43da042a19ddd8 |
|
MD5 | bb669d778c3fe84f5014e5c4af1e93c2 |
|
BLAKE2b-256 | 51d339b64318bb4b678aa741ce365f26410f75b341b018e1b241bb01f97a5abb |