A tool to generate comprehensive Markdown artifacts of directory structures and file contents
Project description
codemapper
Overview
The Code Mapper is a powerful Python script that creates a comprehensive Markdown document representing the structure and contents of a given directory or GitHub repository. This tool is designed to provide a quick and thorough overview of codebases, making it invaluable for developers, AI systems, and analysts who need to quickly understand the layout and content of a project.
See audio explainers for this project:
- podcasts Auto generated by Gemini (NotebookLLM)
Features
- Generates a hierarchical table of contents based on file structure
- Creates an accurate file tree representation of the directory structure
- Produces code blocks for each file's contents with appropriate syntax highlighting
- Respects
.gitignore
rules when processing files and directories - Excludes
.git
directories by default - Supports various file types with appropriate code fence highlighting
- Handles file encoding detection for accurate content reading
- Provides an option to include files normally ignored by
.gitignore
- Can clone and analyze GitHub repositories
- Saves output in a '_mapped' directory
- Automatically acknowledges large and binary files without printing their contents
- Displays file type and size information for large and binary files
Requirements
- Python 3.6+
pathspec
library (for handling.gitignore
rules)chardet
library (for file encoding detection)
Installation
From PyPI
You can install CodeMapper directly from PyPI using pip:
pip install codemapper
From Source
-
Clone this repository:
git clone https://github.com/yourusername/codemapper.git
-
Install the required dependencies:
pip install pathspec chardet
Usage
Run the script from the command line, providing the path to the directory or GitHub repository URL you want to analyze:
codemapper <path_to_directory_or_github_url> [--include-ignored]
Options
<path_to_directory_or_github_url>
: The path to the directory or GitHub repository URL you want to analyze (required)--include-ignored
: Include files that are normally ignored by.gitignore
(optional)
Output
The script generates a Markdown file named <directory_name>codemap.md
in the '_mapped' directory. This file contains:
- A table of contents for easy navigation
- A file tree representation of the directory structure
- The contents of each file, formatted with appropriate syntax highlighting
- Information about large and binary files (type and size) without their contents
Example use and output:
codemapper https://github.com/shaneholloman/ansible-role-apache
Use Cases
- Quickly understand the structure of new or unfamiliar projects
- Generate documentation for your codebase
- Facilitate code reviews by providing a comprehensive overview
- Aid in onboarding new team members to a project
- Assist AI systems in analyzing and understanding codebases
- Analyze GitHub repositories without needing to clone them manually
TODO
- Add support for creating directly from repo url
- Implement a clever way to include images in the artifacts, maybe base64 encode them directly to the markdown file, but that could chew thru tokens at prompt time? Suggestions?
- Add support for other Git hosting services (e.g., GitLab, Bitbucket)
- Implement a progress indicator for cloning/analyzing large repositories
- Table of Contents in some cases needs improvement. We may add some ignores
- Use changelog.md for version history
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Thanks to the
pathspec
andchardet
libraries for making this tool possible.
Version History
- 3.2.0 (2024-09-23):
- Improved handling of large and binary files:
- Large and binary files are now always acknowledged without attempting to print their contents
- File type and size information is displayed for large and binary files
- Removed option to include large file contents as it's not practical for binary files
- Simplified command-line options by removing flags related to large file handling
- Added PyPI installation support
- Improved handling of large and binary files:
[For full version history, see changelog.md]
Don't forget to star this repository if you find it useful!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file codemapper-3.2.3.tar.gz
.
File metadata
- Download URL: codemapper-3.2.3.tar.gz
- Upload date:
- Size: 11.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c3b2ca07b15f27cf40e7d6281cdb861a9a1819cab5433a92ef213f4e62f91eab |
|
MD5 | 00910e9c947a97b45e891b5968d8f7bb |
|
BLAKE2b-256 | 58d11c8ef17cfe7057ee241c9644606697070d96894e018c7dc9fd537db72db8 |
File details
Details for the file codemapper-3.2.3-py3-none-any.whl
.
File metadata
- Download URL: codemapper-3.2.3-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 60bfc369f21491f82ab16a075bd3e67b42c03492bad6cf7a486df926510e691e |
|
MD5 | 3f6aaea7b6cd5206e1e7c91b27937843 |
|
BLAKE2b-256 | 5a17e4e74238a13551b342b65684655fbc47546e08fec09b03b5a1dabf7d7174 |