Enterprise-grade code dump utility for monorepos
Project description
create-dump
Enterprise-Grade Code Dump Utility for Monorepos
create-dump is a production-ready CLI tool for automated code archival in large-scale monorepos. It generates branded Markdown dumps with Git metadata, integrity checksums, flexible archiving, retention policies, path safety, full concurrency, and SRE-grade observability.
Designed for SRE-heavy environments (Telegram bots, microservices, monorepos), it ensures reproducible snapshots for debugging, forensics, compliance audits, and CI/CD pipelines. It also includes a rollback command to restore a project from a dump file.
Built for Python 3.11+, leveraging AnyIO, Pydantic, Typer, Rich, and Prometheus metrics.
⚡ Quick Start
Prerequisites
- Python: 3.11 or higher
- Git: Optional, but recommended for metadata and
git ls-filessupport. - Docker/Podman: Optional, if running via container.
Installation
Via PyPI:
pip install create-dump
Via Source (Dev):
git clone https://github.com/dhruv13x/create-dump.git
cd create-dump
pip install -e .[dev]
Run (The "Hello World")
Navigate to your project root and run:
create-dump
This creates a markdown snapshot of your current directory in create_dump_output/ (or root).
Demo Snippet
Copy-paste this to see it in action:
# 1. Install
pip install create-dump
# 2. Dump your current folder (excluding hidden files)
create-dump single --use-gitignore --no-toc
# 3. View the result
head -n 20 *_all_create_dump_*.md
✨ Features
Core
- Branded Markdown: Auto-generated TOC (list or tree), language detection, and metadata headers.
- Smart Collection: Respects
.gitignoreautomatically. Use--git-ls-filesfor blazing fast, Git-native file discovery. - Multi-Mode:
single: Dump one project/directory.batch: Recursively dump multiple subprojects in a monorepo.rollback: Restore a project from a dump file.
- Live Watch: Run with
--watchto auto-update the dump whenever files change.
Performance
- Async & Concurrent: Powered by
anyiowith up to 16 parallel workers for massive speedups on large repos. - Smart Caching: Hashes config and file metadata to skip processing unchanged files.
- Low Footprint: Optimized for CI/CD pipelines.
Security & SRE
- Secret Scanning: Integrated
detect-secretsscanning.- Fail on secret detection:
--scan-secrets - Auto-redact secrets:
--hide-secrets
- Fail on secret detection:
- Safe Paths: Anti-traversal guards to prevent Zip-Slip attacks.
- Observability: Prometheus metrics server (default port 8000) and structured JSON logging.
- ChatOps: Native push notifications to Slack, Discord, Telegram, and ntfy.sh.
🛠️ Configuration
You can configure create-dump via CLI arguments, Environment Variables (loaded into config), or a TOML file (create_dump.toml or pyproject.toml).
Environment Variables & TOML Keys
Define these in [tool.create-dump] section of pyproject.toml or create_dump.toml.
| Key | Type | Description | Default |
|---|---|---|---|
dest |
Path | Default output destination. | . |
use_gitignore |
Bool | Exclude files listed in .gitignore. |
true |
git_meta |
Bool | Include Branch/Commit hash in header. | true |
max_file_size_kb |
Int | Skip files larger than this KB. | 5000 |
excluded_dirs |
List | Directories to always ignore (e.g., .git, node_modules). |
[...] |
metrics_port |
Int | Port for Prometheus metrics. | 8000 |
git_ls_files |
Bool | Use Git index for file list. | false |
scan_secrets |
Bool | Enable secret scanning. | false |
hide_secrets |
Bool | Redact secrets if found. | false |
CLI Arguments (create-dump single)
| Flag | Shorthand | Description |
|---|---|---|
--dest <path> |
Output directory. | |
--watch |
Enable live-watch mode. | |
--git-ls-files |
Use git ls-files (fastest collection). |
|
--diff-since <ref> |
Dump only files changed since Git ref. | |
--scan-secrets |
Enable secret detection. | |
--hide-secrets |
Redact detected secrets (requires scan). | |
--secret-patterns |
Add custom regex patterns for secrets. | |
--scan-todos |
Extract TODO/FIXME comments into summary. | |
--archive |
-a |
Compress previous dumps into ZIP. |
--compress |
-c |
Gzip the output .md.gz. |
--format <fmt> |
Output format (md or json). |
|
--db-provider <type> |
Dump DB (postgres, mysql) alongside code. |
|
--db-host <host> |
Database host (default: localhost). | |
--db-port <port> |
Database port. | |
--db-user <user> |
Database user. | |
--db-pass-env <var> |
Env var name containing DB password. | |
--notify-slack <url> |
Send webhook on completion. | |
--dry-run |
-d |
Simulate without writing files. |
Run create-dump --help for the full list.
🏗️ Architecture
Directory Tree
src/create_dump/
├── cli/ # Entry points (main, single, batch)
├── scanning/ # Secret scanning & security
├── collector/ # File gathering (glob/git)
├── workflow/ # Processing pipelines
├── writing/ # Output generation (MD/JSON)
├── archive/ # Compression & retention
├── rollback/ # Restore functionality
└── core.py # Config & Models
Data Flow
- CLI Entry: User invokes
create-dump(via Typer). - Config Load: Merges defaults,
pyproject.toml, and CLI args. - Collector: Finds files via
globorgit ls-files. Applies ignores. - Processor (Async):
- Reads file content.
- Middlewares: Secret Scan -> TODO Scan -> Language Detect.
- Writer: Aggregates processed files into a Markdown/JSON artifact.
- Post-Process:
- Archiver: Rotates old dumps.
- Notifier: Sends Slack/Discord alerts.
🐞 Troubleshooting
| Error Message | Possible Cause | Solution |
|---|---|---|
No matching files found |
.gitignore or exclude patterns are too aggressive. |
Check patterns or run with --no-use-gitignore. |
RecursionError |
Deeply nested directory or symlink loop. | Use --exclude on the problematic path. |
Secret detected in ... |
Code contains an API key/password. | Rotate the key! Or use --hide-secrets / --secret-patterns to ignore/redact. |
git ls-files failed |
Not inside a Git repository. | Run git init or don't use --git-ls-files. |
DB connection failed |
Wrong credentials or host unreachable. | Check --db-host, --db-port, and --db-pass-env. |
Debug Mode:
Run with -v or --verbose to see detailed logs and stack traces.
create-dump single -v
🤝 Contributing
We love contributions! Please check our CONTRIBUTING.md for details.
Dev Setup:
- Clone:
git clone ... - Install:
pip install -e .[dev] - Test:
pytest - Lint:
ruff check .
🗺️ Roadmap
- Smart Caching: Re-use processing for unchanged files.
- Rollback Command: Restore projects from dumps.
- Database Dumps: Postgres/MySQL integration.
- S3/Cloud Upload: Direct upload of dumps/archives.
- PDF Export: Generate PDF reports instead of Markdown.
- Plugin System: Allow custom middleware via Python entrypoints.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file create_dump-15.0.1.tar.gz.
File metadata
- Download URL: create_dump-15.0.1.tar.gz
- Upload date:
- Size: 1.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a39609c8ef534a6ad17ea33509361bb721546998b6df4ca2feb428e41c4a7bce
|
|
| MD5 |
b621a29e68cd72d5716612f6f7d0e51f
|
|
| BLAKE2b-256 |
8cc3bde8743c84fce709fc6cacc44e1ce9af2a9f1f310ad20eb444491ea82335
|
Provenance
The following attestation bundles were made for create_dump-15.0.1.tar.gz:
Publisher:
publish.yml on dhruv13x/create-dump
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
create_dump-15.0.1.tar.gz -
Subject digest:
a39609c8ef534a6ad17ea33509361bb721546998b6df4ca2feb428e41c4a7bce - Sigstore transparency entry: 764296347
- Sigstore integration time:
-
Permalink:
dhruv13x/create-dump@a1b5ec6f2e8b13760f9da9a126e750bd4d5680c4 -
Branch / Tag:
refs/tags/v15.0.1 - Owner: https://github.com/dhruv13x
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1b5ec6f2e8b13760f9da9a126e750bd4d5680c4 -
Trigger Event:
push
-
Statement type:
File details
Details for the file create_dump-15.0.1-py3-none-any.whl.
File metadata
- Download URL: create_dump-15.0.1-py3-none-any.whl
- Upload date:
- Size: 76.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bd117c505bb3aa5ff4f09f48fdb1507a225361cd399a09bc035882f27bc2daa
|
|
| MD5 |
dac356778dc76effe1530f9abdced06e
|
|
| BLAKE2b-256 |
164d0f747cf4eeea9db5f64ed2c0680366037c93112e5d2ae11dcb5a20240a1c
|
Provenance
The following attestation bundles were made for create_dump-15.0.1-py3-none-any.whl:
Publisher:
publish.yml on dhruv13x/create-dump
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
create_dump-15.0.1-py3-none-any.whl -
Subject digest:
6bd117c505bb3aa5ff4f09f48fdb1507a225361cd399a09bc035882f27bc2daa - Sigstore transparency entry: 764296349
- Sigstore integration time:
-
Permalink:
dhruv13x/create-dump@a1b5ec6f2e8b13760f9da9a126e750bd4d5680c4 -
Branch / Tag:
refs/tags/v15.0.1 - Owner: https://github.com/dhruv13x
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1b5ec6f2e8b13760f9da9a126e750bd4d5680c4 -
Trigger Event:
push
-
Statement type: