Skip to main content

A CLI tool to export and import schema definitions and data from CockroachDB in SQL, JSON, YAML, or chunked CSV formats.

Project description

PyPI version Python versions License Build status

crdb-dump

A feature-rich CLI for exporting and importing CockroachDB schemas and data. Includes support for parallel chunked exports, manifest checksums, BYTES/UUID/ARRAY types, permission introspection, secure resumable imports, S3-compatible storage (MinIO, Cohesity), region-aware filtering, and automatic retry logic.


🚀 Features

  • ✅ Schema export: tables, views, sequences, enums
  • ✅ Data export: CSV or SQL with chunking, gzip, and ordering
  • ✅ Types: handles BYTES, UUIDs, STRING[], TIMESTAMP, enums
  • ✅ Schema output formats: sql, json, yaml
  • ✅ Resumable COPY-based imports with chunk-level tracking
  • ✅ Permission exports: roles, grants, role memberships
  • ✅ Parallel loading (--parallel-load) and manifest verification
  • ✅ Dry-run for schema or chunk loading
  • ✅ TLS and insecure auth supported
  • ✅ Schema diff support (--diff)
  • ✅ Full logging via logs/crdb_dump.log
  • ✅ Automatic retry logic with exponential backoff for transient failures
  • ✅ Fault-tolerant, resumable imports with --resume-log or --resume-log-dir
  • ✅ Region-aware export/import via --region
  • ✅ S3-compatible support (--use-s3) with MinIO, Cohesity, or AWS
  • ✅ CSV header validation (--validate-csv)
  • ✅ Python-based S3 bucket creation (via boto3) for MinIO

📦 Installation

pip install crdb-dump

🧪 Local Testing

./test-local.sh

This script will:

  • Start a multi-region demo CockroachDB cluster
  • Create test schema + data
  • Export schema and chunked data (CSV)
  • Verify chunk checksums
  • Dry-run and real import with retry/resume
  • Upload chunks to MinIO (S3-compatible)
  • Download and verify import from S3
  • Use Python (boto3) to create S3 buckets

🔧 CLI Overview

crdb-dump --help
crdb-dump export --help
crdb-dump load --help

Example usage:

crdb-dump export --db=mydb --data --per-table
crdb-dump load --db=mydb --schema=... --data-dir=... --resume-log=resume.json

🔐 Connection

export CRDB_URL="cockroachdb://root@localhost:26257/defaultdb?sslmode=disable"
# or
export CRDB_URL="postgresql://root@localhost:26257/defaultdb?sslmode=disable"

Alternatively:

--db mydb --host localhost --certs-dir ~/certs

Use --print-connection to verify resolved URL.


🏗 Export Options

crdb-dump export \
  --db=mydb \
  --per-table \
  --data \
  --data-format=csv \
  --chunk-size=1000 \
  --data-order=id \
  --data-compress \
  --data-parallel \
  --verify \
  --include-permissions \
  --archive

Schema Output

Option Description
--per-table One file per object (e.g., table_users.sql)
--format Output format: sql, json, yaml
--diff Show schema diff vs previous .sql file
--tables Comma-separated FQ names to include
--exclude-tables Skip specific FQ table names
--include-permissions Export roles, grants, and memberships
--region Only export tables matching this region

Data Export

Option Description
--data Enable data export
--data-format Format: csv or sql
--chunk-size Number of rows per chunk
--data-split Output one file per table
--data-compress Output .csv.gz
--data-order Order rows by column(s)
--data-order-desc Use descending order
--data-parallel Parallel export across tables
--verify Verify chunk checksums
--region Filter tables by region in manifests
--use-s3 Upload exported chunks to S3
--s3-bucket S3 bucket name
--s3-prefix Key prefix under which to store chunks
--s3-endpoint S3-compatible endpoint URL
--s3-access-key S3 access key (can use env)
--s3-secret-key S3 secret key (can use env)

⛓ Import Options

crdb-dump load \
  --db=mydb \
  --schema=crdb_dump_output/mydb/mydb_schema.sql \
  --data-dir=crdb_dump_output/mydb \
  --resume-log=resume.json \
  --validate-csv \
  --parallel-load \
  --print-connection
Option Description
--schema .sql file to apply
--data-dir Folder containing chunked CSV + manifests
--resume-log Track loaded chunks in a single JSON file
--resume-log-dir Per-table resume logs (e.g. resume/users.json)
--validate-csv Ensure chunk headers match DB schema
--parallel-load Load chunks in parallel
--region Only import chunks from matching region
--dry-run Print actions but don't execute
--use-s3 Download chunks from S3
--s3-bucket S3 bucket name
--s3-prefix Path prefix inside the bucket
--s3-endpoint S3-compatible endpoint (MinIO, Cohesity)
--s3-access-key S3 access key
--s3-secret-key S3 secret key

🔄 Fault Tolerance & Resume Support

  • ✅ Retries failed operations with exponential backoff

  • ✅ Resumable imports:

    • --resume-log (single file)
    • --resume-log-dir (per-table)
    • --resume-strict (abort on failure)

Writes resume state after each successful chunk. Restarts are safe and idempotent.


☁️ S3 / MinIO / Cohesity Example

crdb-dump export \
  --db=mydb \
  --per-table \
  --data \
  --chunk-size=1000 \
  --data-format=csv \
  --use-s3 \
  --s3-bucket=crdb-test-bucket \
  --s3-endpoint=http://localhost:9000 \
  --s3-access-key=minioadmin \
  --s3-secret-key=minioadmin \
  --s3-prefix=test1/ \
  --out-dir=crdb_dump_output

crdb-dump load \
  --db=mydb \
  --data-dir=crdb_dump_output/mydb \
  --resume-log-dir=resume/ \
  --parallel-load \
  --validate-csv \
  --use-s3 \
  --s3-bucket=crdb-test-bucket \
  --s3-endpoint=http://localhost:9000 \
  --s3-access-key=minioadmin \
  --s3-secret-key=minioadmin \
  --s3-prefix=test1/

🔍 Schema Diff Example

crdb-dump export --db=mydb --diff=old_schema.sql

Output:

crdb_dump_output/mydb/mydb_schema.diff

🧪 Testing

pytest -m unit
pytest -m integration
./test-local.sh

❤️ Contributing

Pull requests welcome! Star ⭐ the repo, file issues, or request features at:

👉 https://github.com/viragtripathi/crdb-dump/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crdb_dump-0.3.0.tar.gz (22.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crdb_dump-0.3.0-py3-none-any.whl (25.2 kB view details)

Uploaded Python 3

File details

Details for the file crdb_dump-0.3.0.tar.gz.

File metadata

  • Download URL: crdb_dump-0.3.0.tar.gz
  • Upload date:
  • Size: 22.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for crdb_dump-0.3.0.tar.gz
Algorithm Hash digest
SHA256 e0479321c5e97eaaba323dd8a31b170c83f2c08d501123e9ec008c5065563883
MD5 51829de02245aab53085f960d02224ad
BLAKE2b-256 21810856d46ac3f9a762d62f90315d025c081fe2804ed7779b712dcd64058a5f

See more details on using hashes here.

File details

Details for the file crdb_dump-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: crdb_dump-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 25.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for crdb_dump-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b273aa4bf97c2d938a1ebe8a84d26a56eb34e6184fe6c90e9071e5b80be11637
MD5 abc7894d6024603fed90194022629748
BLAKE2b-256 f81cec6788ba231886c3a089515c1606f62970c1bcda8e8800f47750224b39f3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page