Skip to main content

LLM-optimized codebase snapshot generator

Project description

mdrepoatlas: Codebase to Markdown (LLM-Optimized Snapshot Generator)

mdrepoatlas converts a software project into a single structured Markdown document (code_base.md) designed for Large Language Models to navigate efficiently.

Instead of uploading repositories, zipping folders, or pasting fragments, mdrepoatlas produces a deterministic, navigable, AI-ready snapshot of your codebase.


Why mdrepoatlas Exists

LLMs do not understand repositories.

They understand documents.

Traditional repo exports create problems:

  • ❌ Too many irrelevant files (node_modules, binaries)
  • ❌ No navigation structure
  • ❌ Context fragmentation
  • ❌ Token waste
  • ❌ Hard for LLMs to reason globally

mdrepoatlas solves this by generating a single authoritative document:


code_base.md

containing:

✅ Metadata header
✅ Project fingerprint detection
✅ Directory tree
✅ Language-grouped index
✅ Deterministic file ordering
✅ Binary/build exclusion
✅ Size-safe embedding
✅ Stable navigation anchors

The result is a document an LLM can read, internalize, and navigate efficiently.


Example Output


code_base.md
├── Metadata Header
├── Project Navigation Guide
├── Directory Structure
├── File Index (grouped by language)
└── Full Source Files
└── ### FILE: src/main.py (...)


Supported Projects

mdrepoatlas is framework-agnostic.

Works with:

  • Python / Django / FastAPI
  • React / Next.js / Node
  • C / C++
  • Fortran
  • Rust / Go
  • Mixed monorepos
  • Research repositories
  • Scientific computing projects
  • Enterprise platforms

Installation

Clone:

git clone https://github.com/DavidoffichW/mdrepoatlas.git
cd mdrepoatlas

Install (editable)

pip install -e .

Usage

Interactive mode:

mdrepoatlas

Non-interactive:

mdrepoatlas /path/to/repo -t /path/to/output -o code_base.md

Exclude patterns (comma-separated; supports globs):

mdrepoatlas /path/to/repo -x "node_modules/**,dist/**,*.pdf"

Disable default exclusions:

mdrepoatlas /path/to/repo --no-default-excludes

Size limits:

mdrepoatlas /path/to/repo --max-file-bytes 1048576 --max-total-bytes 0

No dependencies required.

Python 3.8+ recommended.


You will be prompted for:

Prompt Description
Source directory Project root
Target directory Output location
Excludes Optional glob patterns
Default excludes Skip builds/binaries
Size limits Prevent huge files

Example:

Source directory:
~/projects/my_app

Target directory:
~/exports

Exclude:
docs/build/**, *.csv

Output:

exports/code_base.md

Default Smart Exclusions

Automatically removes noise:

  • .git/
  • node_modules/
  • virtual environments
  • build artifacts
  • binaries
  • media files
  • compiled objects
  • caches

LLM receives signal only.


Why This Works Well For LLMs

The generated document teaches the model how to read it.

Key design principles:

1. Deterministic Structure

Every file appears as:

### FILE: path/to/file.py (metadata)

LLMs can jump instantly.


2. Navigation Before Content

Models first learn:

  • project structure
  • entrypoints
  • languages
  • priorities

before reading implementation.


3. Context Efficiency

Instead of scanning thousands of irrelevant files:

  • binaries are omitted
  • minified bundles skipped
  • oversized files summarized

Example Prompt for ChatGPT / Claude

After generating code_base.md, upload it and start with:


🔹 Recommended Initialization Prompt

You are now analyzing a full project snapshot.

The uploaded file `code_base.md` is an authoritative
LLM-optimized export of the repository.

Instructions:
1. Read the metadata header first.
2. Use the Directory Structure and Index sections to build a mental map.
3. Treat each "### FILE:" section as an independent module.
4. Do NOT assume missing files exist outside the snapshot.
5. Prefer entrypoints and core modules when reasoning.

First task:
Summarize the system architecture and identify primary subsystems.


🔹 Example Follow-up Prompts

Architecture understanding:

Explain the project architecture using only the snapshot.

Refactoring:

Identify architectural weaknesses and propose improvements.

Bug investigation:

Search for potential concurrency or state-management issues.

Feature design:

Design a new feature consistent with existing patterns.

Recommended LLM Workflow

  1. Run mdrepoatlas
  2. Upload code_base.md
  3. Initialize model using prompt above
  4. Work normally

You now have full-repo reasoning.


Design Philosophy

mdrepoatlas treats an LLM as:

a deterministic reader of structured technical documents.

The goal is not compression.

The goal is cognitive alignment between repository and model.


Comparison

Method Result
Upload repo ❌ inconsistent
Paste files ❌ fragmented
Zip archive ❌ opaque
mdrepoatlas ✅ structured understanding

Roadmap

Planned improvements:

  • pip installable CLI
  • gitignore parsing
  • incremental snapshots
  • diff snapshots
  • multi-document mode
  • token estimation
  • IDE integration
  • local LLM pipeline support

Contributing

PRs welcome.

Good areas:

  • language detection
  • ordering heuristics
  • performance
  • additional exclusions
  • LLM workflow research

License

MIT License.


Author

Created to bridge software engineering and AI reasoning workflows.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdrepoatlas-0.1.2.tar.gz (14.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdrepoatlas-0.1.2-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file mdrepoatlas-0.1.2.tar.gz.

File metadata

  • Download URL: mdrepoatlas-0.1.2.tar.gz
  • Upload date:
  • Size: 14.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for mdrepoatlas-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b04a7204dd4fd40ab5707f5e3784921bcb500b09b9c9c94a011537a178a83bd4
MD5 4aa9e056fd4a853086f184d6f4108ab5
BLAKE2b-256 51bbd293aab32ffe4611c508379ea70e240e448c5d64b014444c9adba050b356

See more details on using hashes here.

File details

Details for the file mdrepoatlas-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: mdrepoatlas-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 12.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.2

File hashes

Hashes for mdrepoatlas-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fb401f5ef536935e8b5a1c5c2657e87a4be67b11e20307684d37d3203dbba0a5
MD5 e89480f5cf93b3820b362d421296c7c2
BLAKE2b-256 2a93101b62087470ca4f3bba163cc91b1e920bbb37b22f1ca077dc7fd4110745

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page