Consolidates and analyzes codebases for insights.
Project description
Codebase Digest
Codebase Digest is a command-line tool written in Python that helps you analyze and understand your codebase. It provides a structured overview of your project's directory structure, file sizes, token counts, and even consolidates the content of all text-based files into a single output for easy analysis with Large Language Models (LLMs).
Table of Contents
- Features
- Installation
- Usage
- Configuration
- Ignore Functionality
- LLM Prompts for Enhanced Analysis
- Contributing
- License
Features
- 📊 Directory Tree Visualization: Generate a hierarchical view of your project structure
- 📈 Codebase Statistics: Calculate total files, directories, code size, and token counts
- 📄 File Content Consolidation: Combine all text-based files into a single output
- 🚫 Flexible Ignore System: Support for custom patterns, defaults, and
.gitignore
files - 🎨 Multiple Output Formats: Choose between text, JSON, Markdown, XML, or HTML
- 🌈 Colored Console Output: Visually appealing and informative summaries
- 🧠 LLM Analysis Support: Comprehensive prompt library for in-depth codebase analysis
Installation
Via pip (Recommended)
pip install codebase-digest
From source
git clone https://github.com/kamilstanuch/codebase-digest.git
cd codebase-digest
pip install -r requirements.txt
Usage
Basic usage:
cdigest [path_to_directory] [options]
Examples:
-
Analyze a project with default settings:
cdigest /path/to/my_project
-
Analyze with custom depth and output format:
cdigest /path/to/my_project -d 3 -o markdown
-
Ignore specific files and folders:
cdigest /path/to/my_project --ignore "*.log" "temp_folder" "config.ini"
-
Show file sizes and include git directory:
cdigest /path/to/my_project --show-size --include-git
-
Analyze and copy output to clipboard:
cdigest /path/to/my_project --copy-to-clipboard
Configuration
Option | Description |
---|---|
path_to_directory |
Path to the directory you want to analyze |
-d, --max-depth |
Maximum depth for directory traversal |
-o, --output-format |
Output format (text, json, markdown, xml, or html). Default: text |
-f, --file |
Output file name |
--show-size |
Show file sizes in directory tree |
--show-ignored |
Show ignored files and directories in tree |
--ignore |
Patterns to ignore (e.g., '*.pyc' '.venv' 'node_modules') |
--keep-defaults |
Keep default ignore patterns when using --ignore |
--no-content |
Exclude file contents from the output |
--include-git |
Include .git directory in the analysis |
--max-size |
Maximum allowed text content size in KB (default: 10240 KB) |
--copy-to-clipboard |
Copy the output to clipboard |
Ignore Functionality
Default Ignore Patterns
The following patterns are ignored by default:
DEFAULT_IGNORE_PATTERNS = [
'.pyc', '.pyo', '.pyd', 'pycache', # Python
'node_modules', 'bower_components', # JavaScript
'.git', '.svn', '.hg', '.gitignore', # Version control
'venv', '.venv', 'env', # Virtual environments
'.idea', '.vscode', # IDEs
'.log', '.bak', '.swp', '.tmp', # Temporary and log files
'.DS_Store', # macOS
'Thumbs.db', # Windows
'build', 'dist', # Build directories
'.egg-info', # Python egg info
'.so', '.dylib', '.dll' # Compiled libraries
]
Custom Ignore Patterns
You can specify additional patterns to ignore using the --ignore
option. These patterns will be added to the default ignore patterns unless --no-default-ignores
is used.
Patterns can use wildcards (* and ?) and can be:
- Filenames (e.g., 'file.txt')
- Directory names (e.g., 'node_modules')
- File extensions (e.g., '*.pyc')
- Paths (e.g., '/path/to/ignore')
Example:
cdigest /path/to/my_project --ignore ".txt" "temp" "/path/to/specific/file.py"
.cdigestignore File
You can create a .cdigestignore
file in your project root to specify project-specific ignore patterns. Each line in this file will be treated as an ignore pattern.
Overriding Default Ignores
To use only your custom ignore patterns without the default ones, use the --no-default-ignores
option:
cdigest /path/to/my_project --no-default-ignores --ignore "custom_pattern" "another_pattern"
LLM Prompts for Enhanced Analysis
Codebase Digest includes a comprehensive set of prompts in the prompt_library
directory to help you analyze your codebase using Large Language Models. These prompts cover various aspects of code analysis and business alignment:
Use Cases
- Codebase Mapping and Learning: Quickly understand the structure and functionality of a new or complex codebase.
- Improving User Stories: Analyze existing code to refine or generate user stories.
- Initial Security Analysis: Perform a preliminary security assessment.
- Code Quality Enhancement: Identify areas for improvement in code quality, readability, and maintainability.
- Documentation Generation: Automatically generate or improve codebase documentation.
- Learning Tool: Use as a teaching aid to explain complex coding concepts or architectures.
- Business Alignment: Analyze how the codebase supports business objectives.
- Stakeholder Communication: Generate insights to facilitate discussions with non-technical stakeholders.
Prompt Categories
-
Code Quality & Understanding
-
Learning & Knowledge Extraction
-
Code Improvement & Transformation
-
Testing & Security
-
Business & Stakeholder Analysis
For detailed instructions on using these prompts, refer to the individual files in the prompt_library
directory.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for codebase_digest-0.1.40-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55b9244c2e3feaf9397bc1e3e9dfb1dfefd6999d062f52d8863faebc57c55f54 |
|
MD5 | a0f62b3e05bb39b79b6da6c54aba6541 |
|
BLAKE2b-256 | 2e33b1a2138db586791340d323dd6ad47e464ae7968a7e487ececd5662fc01c1 |