A CLI tool to export and import schema definitions and data from CockroachDB in SQL, JSON, YAML, or chunked CSV formats.
Project description
crdb-dump
A feature-rich CLI for exporting and importing CockroachDB schemas and data. Includes support for parallel chunked exports, manifest checksums, BYTES/UUID/ARRAY/VECTOR types, multi-schema (non-public) objects, permission introspection, secure resumable imports, S3-compatible storage (MinIO, Cohesity), region-aware filtering, and automatic retry logic.
Requires Python 3.10+.
⚠️ Breaking changes in 0.4.0
- All object names are now three-part
database.schema.table(filenames, manifests, resume-log keys, and--tablesinput). Objects in non-publicschemas are now exported and restored correctly.--tablestwo-part input meansschema.table(database taken from--db), not the olddatabase.table. Usedb.schema.tableto be explicit, or a baretablefor thepublicschema.- Data chunk files are now
db.schema.table_NNN.csv|sql; manifests aredb.schema.table.manifest.json. Pre-0.4.0 dumps are not compatible.See CHANGELOG.md for the full list.
🚀 Features
- ✅ Schema export: tables, views, sequences, enums (objects in any schema, not just
public) - ✅ Full-database dumps use native
SHOW CREATE ALL TABLES/ALL TYPES(dependency-ordered, FK constraints validated post-load) - ✅ Data export: CSV or SQL with chunking, gzip, and ordering
- ✅ Types: handles BYTES, UUIDs, STRING[], TIMESTAMP, enums, VECTOR
- ✅ Schema output formats:
sql,json,yaml - ✅ Resumable
COPY-based imports with chunk-level tracking - ✅ Permission exports: roles, grants, role memberships
- ✅ Parallel loading (
--parallel-load) and manifest verification - ✅ Dry-run for schema or chunk loading
- ✅ TLS and insecure auth supported
- ✅ Schema diff support (
--diff) - ✅ Full logging via
logs/crdb_dump.log - ✅ Automatic retry logic with exponential backoff for transient failures
- ✅ Fault-tolerant, resumable imports with
--resume-logor--resume-log-dir - ✅ Region-aware export/import via
--region - ✅ S3-compatible support (
--use-s3) with MinIO, Cohesity, or AWS - ✅ CSV header validation (
--validate-csv) - ✅ Python-based S3 bucket creation (via
boto3) for MinIO
📦 Installation
pip install crdb-dump
🧪 Local Testing
./test-local.sh
This script will:
- Start a multi-region demo CockroachDB cluster
- Create test schema + data
- Export schema and chunked data (CSV)
- Verify chunk checksums
- Dry-run and real import with retry/resume
- Upload chunks to MinIO (S3-compatible)
- Download and verify import from S3
- Use Python (
boto3) to create S3 buckets
🔧 CLI Overview
crdb-dump --help
crdb-dump export --help
crdb-dump load --help
Example usage:
crdb-dump export --db=mydb --data --per-table
crdb-dump load --db=mydb --schema=... --data-dir=... --resume-log=resume.json
🔐 Connection
export CRDB_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
# or
export CRDB_URL="postgresql://root@localhost:26257/defaultdb?sslmode=disable"
Alternatively:
--db mydb --host localhost --certs-dir ~/certs
Use --print-connection to verify resolved URL.
🏗 Export Options
crdb-dump export \
--db=mydb \
--per-table \
--data \
--data-format=csv \
--chunk-size=1000 \
--data-order=id \
--data-compress \
--data-parallel \
--verify \
--include-permissions \
--archive
Schema Output
| Option | Description |
|---|---|
--per-table |
One file per object (e.g., table_mydb.public.users.sql) |
--format |
Output format: sql, json, yaml |
--diff |
Show schema diff vs previous .sql file |
--tables |
Comma-separated names to include: table, schema.table, or db.schema.table |
--exclude-tables |
Skip specific table names (same forms as --tables) |
--include-permissions |
Export roles, grants, and memberships |
--region |
Only export tables matching this region |
Data Export
| Option | Description |
|---|---|
--data |
Enable data export |
--data-format |
Format: csv or sql |
--chunk-size |
Number of rows per chunk |
--data-split |
Output one file per table |
--data-compress |
Output .csv.gz |
--data-order |
Order rows by column(s) |
--data-order-desc |
Use descending order |
--data-parallel |
Parallel export across tables |
--verify |
Verify chunk checksums |
--region |
Filter tables by region in manifests |
--use-s3 |
Upload exported chunks to S3 |
--s3-bucket |
S3 bucket name |
--s3-prefix |
Key prefix under which to store chunks |
--s3-endpoint |
S3-compatible endpoint URL |
--s3-access-key |
S3 access key (can use env) |
--s3-secret-key |
S3 secret key (can use env) |
⛓ Import Options
crdb-dump load \
--db=mydb \
--schema=crdb_dump_output/mydb/mydb_schema.sql \
--data-dir=crdb_dump_output/mydb \
--resume-log=resume.json \
--validate-csv \
--parallel-load \
--print-connection
| Option | Description |
|---|---|
--schema |
.sql file to apply |
--data-dir |
Folder containing chunked CSV + manifests |
--resume-log |
Track loaded chunks in a single JSON file |
--resume-log-dir |
Per-table resume logs (e.g. resume/users.json) |
--validate-csv |
Ensure chunk headers match DB schema |
--parallel-load |
Load chunks in parallel |
--region |
Only import chunks from matching region |
--dry-run |
Print actions but don't execute |
--use-s3 |
Download chunks from S3 |
--s3-bucket |
S3 bucket name |
--s3-prefix |
Path prefix inside the bucket |
--s3-endpoint |
S3-compatible endpoint (MinIO, Cohesity) |
--s3-access-key |
S3 access key |
--s3-secret-key |
S3 secret key |
🔄 Fault Tolerance & Resume Support
-
✅ Retries failed operations with exponential backoff
-
✅ Resumable imports:
--resume-log(single file)--resume-log-dir(per-table)--resume-strict(abort on failure)
Writes resume state after each successful chunk. Restarts are safe and idempotent.
☁️ S3 / MinIO / Cohesity Example
crdb-dump export \
--db=mydb \
--per-table \
--data \
--chunk-size=1000 \
--data-format=csv \
--use-s3 \
--s3-bucket=crdb-test-bucket \
--s3-endpoint=http://localhost:9000 \
--s3-access-key=minioadmin \
--s3-secret-key=minioadmin \
--s3-prefix=test1/ \
--out-dir=crdb_dump_output
crdb-dump load \
--db=mydb \
--data-dir=crdb_dump_output/mydb \
--resume-log-dir=resume/ \
--parallel-load \
--validate-csv \
--use-s3 \
--s3-bucket=crdb-test-bucket \
--s3-endpoint=http://localhost:9000 \
--s3-access-key=minioadmin \
--s3-secret-key=minioadmin \
--s3-prefix=test1/
🔍 Schema Diff Example
crdb-dump export --db=mydb --diff=old_schema.sql
Output:
crdb_dump_output/mydb/mydb_schema.diff
🧪 Testing
Requires Python 3.10+.
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
# Unit tests (no database needed)
pytest -m "not integration"
# Integration tests (need a reachable CockroachDB)
export CRDB_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
pytest -m integration
# Full end-to-end (needs cockroach + Docker/MinIO)
./test-local.sh
🚀 Releasing (maintainers)
Releases publish to PyPI via the Release GitHub Action
(.github/workflows/release.yml) using PyPI Trusted Publishing (OIDC) — no
API tokens stored in the repo.
One-time setup on PyPI: add a Trusted Publisher for the crdb-dump project →
owner viragtripathi, repository crdb-dump, workflow release.yml.
To cut a release:
- Bump
versioninpyproject.tomland updateCHANGELOG.md; merge tomain. - Run the Release workflow (Actions → Release → Run workflow) and enter the
same version (e.g.
0.4.0).
The workflow verifies the input matches the packaged version, runs the full test
suite against a CockroachDB container, builds the sdist/wheel, publishes to PyPI,
and creates a v<version> GitHub Release with auto-generated notes.
❤️ Contributing
Pull requests welcome! Star ⭐ the repo, file issues, or request features at:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crdb_dump-0.4.0.tar.gz.
File metadata
- Download URL: crdb_dump-0.4.0.tar.gz
- Upload date:
- Size: 29.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7b03abc5d00082645cfa7cb7a181e4d6e8e2b5924d726f897b85ca042683ff78
|
|
| MD5 |
b5519264eca64ce451501aa6afd15875
|
|
| BLAKE2b-256 |
09e90a06953a55efda7ff5d80dd86fd8eab79f3c81dd6b25269f62171bae15de
|
Provenance
The following attestation bundles were made for crdb_dump-0.4.0.tar.gz:
Publisher:
release.yml on viragtripathi/crdb-dump
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crdb_dump-0.4.0.tar.gz -
Subject digest:
7b03abc5d00082645cfa7cb7a181e4d6e8e2b5924d726f897b85ca042683ff78 - Sigstore transparency entry: 1969508159
- Sigstore integration time:
-
Permalink:
viragtripathi/crdb-dump@2816a4e1bb119497b2b8be90f01515f9ebcee620 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/viragtripathi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2816a4e1bb119497b2b8be90f01515f9ebcee620 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file crdb_dump-0.4.0-py3-none-any.whl.
File metadata
- Download URL: crdb_dump-0.4.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d12f3b4a49e91fb602bb7814ca4abf837dde742b64304c95a0914502ca128a76
|
|
| MD5 |
e7afb0980221f25fabd0dac063785b32
|
|
| BLAKE2b-256 |
400fec71f3ef4440c01d9f4ba3d572467f88d75d893a37123827f05f66925f86
|
Provenance
The following attestation bundles were made for crdb_dump-0.4.0-py3-none-any.whl:
Publisher:
release.yml on viragtripathi/crdb-dump
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crdb_dump-0.4.0-py3-none-any.whl -
Subject digest:
d12f3b4a49e91fb602bb7814ca4abf837dde742b64304c95a0914502ca128a76 - Sigstore transparency entry: 1969508247
- Sigstore integration time:
-
Permalink:
viragtripathi/crdb-dump@2816a4e1bb119497b2b8be90f01515f9ebcee620 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/viragtripathi
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2816a4e1bb119497b2b8be90f01515f9ebcee620 -
Trigger Event:
workflow_dispatch
-
Statement type: