LLM-optimized codebase snapshot generator
Project description
mdrepoatlas: Codebase to Markdown (LLM-Optimized Snapshot Generator)
mdrepoatlas converts a software project into a single structured Markdown document (code_base.md) designed for Large Language Models to navigate efficiently.
Instead of uploading repositories, zipping folders, or pasting fragments, mdrepoatlas produces a deterministic, navigable, AI-ready snapshot of your codebase.
Why mdrepoatlas Exists
LLMs do not understand repositories.
They understand documents.
Traditional repo exports create problems:
- ❌ Too many irrelevant files (
node_modules, binaries) - ❌ No navigation structure
- ❌ Context fragmentation
- ❌ Token waste
- ❌ Hard for LLMs to reason globally
mdrepoatlas solves this by generating a single authoritative document:
code_base.md
containing:
✅ Metadata header
✅ Project fingerprint detection
✅ Directory tree
✅ Language-grouped index
✅ Deterministic file ordering
✅ Binary/build exclusion
✅ Size-safe embedding
✅ Stable navigation anchors
The result is a document an LLM can read, internalize, and navigate efficiently.
Example Output
code_base.md
├── Metadata Header
├── Project Navigation Guide
├── Directory Structure
├── File Index (grouped by language)
└── Full Source Files
└── ### FILE: src/main.py (...)
Supported Projects
mdrepoatlas is framework-agnostic.
Works with:
- Python / Django / FastAPI
- React / Next.js / Node
- C / C++
- Fortran
- Rust / Go
- Mixed monorepos
- Research repositories
- Scientific computing projects
- Enterprise platforms
Installation
Clone:
git clone https://github.com/DavidoffichW/mdrepoatlas.git
cd mdrepoatlas
Install (editable)
pip install -e .
Usage
Interactive mode:
mdrepoatlas
Non-interactive:
mdrepoatlas /path/to/repo -t /path/to/output -o code_base.md
Exclude patterns (comma-separated; supports globs):
mdrepoatlas /path/to/repo -x "node_modules/**,dist/**,*.pdf"
Disable default exclusions:
mdrepoatlas /path/to/repo --no-default-excludes
Size limits:
mdrepoatlas /path/to/repo --max-file-bytes 1048576 --max-total-bytes 0
No dependencies required.
Python 3.8+ recommended.
You will be prompted for:
| Prompt | Description |
|---|---|
| Source directory | Project root |
| Target directory | Output location |
| Excludes | Optional glob patterns |
| Default excludes | Skip builds/binaries |
| Size limits | Prevent huge files |
Example:
Source directory:
~/projects/my_app
Target directory:
~/exports
Exclude:
docs/build/**, *.csv
Output:
exports/code_base.md
Default Smart Exclusions
Automatically removes noise:
.git/node_modules/- virtual environments
- build artifacts
- binaries
- media files
- compiled objects
- caches
LLM receives signal only.
Why This Works Well For LLMs
The generated document teaches the model how to read it.
Key design principles:
1. Deterministic Structure
Every file appears as:
### FILE: path/to/file.py (metadata)
LLMs can jump instantly.
2. Navigation Before Content
Models first learn:
- project structure
- entrypoints
- languages
- priorities
before reading implementation.
3. Context Efficiency
Instead of scanning thousands of irrelevant files:
- binaries are omitted
- minified bundles skipped
- oversized files summarized
Example Prompt for ChatGPT / Claude
After generating code_base.md, upload it and start with:
🔹 Recommended Initialization Prompt
You are now analyzing a full project snapshot.
The uploaded file `code_base.md` is an authoritative
LLM-optimized export of the repository.
Instructions:
1. Read the metadata header first.
2. Use the Directory Structure and Index sections to build a mental map.
3. Treat each "### FILE:" section as an independent module.
4. Do NOT assume missing files exist outside the snapshot.
5. Prefer entrypoints and core modules when reasoning.
First task:
Summarize the system architecture and identify primary subsystems.
🔹 Example Follow-up Prompts
Architecture understanding:
Explain the project architecture using only the snapshot.
Refactoring:
Identify architectural weaknesses and propose improvements.
Bug investigation:
Search for potential concurrency or state-management issues.
Feature design:
Design a new feature consistent with existing patterns.
Recommended LLM Workflow
- Run
mdrepoatlas - Upload
code_base.md - Initialize model using prompt above
- Work normally
You now have full-repo reasoning.
Design Philosophy
mdrepoatlas treats an LLM as:
a deterministic reader of structured technical documents.
The goal is not compression.
The goal is cognitive alignment between repository and model.
Comparison
| Method | Result |
|---|---|
| Upload repo | ❌ inconsistent |
| Paste files | ❌ fragmented |
| Zip archive | ❌ opaque |
mdrepoatlas |
✅ structured understanding |
Roadmap
Planned improvements:
- pip installable CLI
- gitignore parsing
- incremental snapshots
- diff snapshots
- multi-document mode
- token estimation
- IDE integration
- local LLM pipeline support
Contributing
PRs welcome.
Good areas:
- language detection
- ordering heuristics
- performance
- additional exclusions
- LLM workflow research
License
MIT License.
Author
Created to bridge software engineering and AI reasoning workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdrepoatlas-0.1.2.tar.gz.
File metadata
- Download URL: mdrepoatlas-0.1.2.tar.gz
- Upload date:
- Size: 14.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b04a7204dd4fd40ab5707f5e3784921bcb500b09b9c9c94a011537a178a83bd4
|
|
| MD5 |
4aa9e056fd4a853086f184d6f4108ab5
|
|
| BLAKE2b-256 |
51bbd293aab32ffe4611c508379ea70e240e448c5d64b014444c9adba050b356
|
File details
Details for the file mdrepoatlas-0.1.2-py3-none-any.whl.
File metadata
- Download URL: mdrepoatlas-0.1.2-py3-none-any.whl
- Upload date:
- Size: 12.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb401f5ef536935e8b5a1c5c2657e87a4be67b11e20307684d37d3203dbba0a5
|
|
| MD5 |
e89480f5cf93b3820b362d421296c7c2
|
|
| BLAKE2b-256 |
2a93101b62087470ca4f3bba163cc91b1e920bbb37b22f1ca077dc7fd4110745
|