Skip to main content

A reliable CLI tool for migrating data between ClickHouse instances

Project description

ClickHouse Migrator ๐Ÿš€

A reliable, high-performance CLI tool for migrating data between ClickHouse instances with support for ClickHouse Cloud and self-hosted deployments.

PyPI version Python versions License: MIT

โœจ Features

  • ๐Ÿ”„ Bidirectional Migration: Migrate between ClickHouse Cloud โ†” Self-hosted, or between any ClickHouse instances
  • ๐ŸŽฏ Custom Query Support: Use custom SQL queries to selectively migrate data
  • ๐Ÿ“Š Real-time Progress Tracking: Beautiful progress bars with speed, ETA, and statistics
  • โšก High Performance: Parallel processing and configurable batch sizes for optimal speed
  • ๐Ÿ”„ Resume Capability: Resume interrupted migrations from checkpoints
  • ๐Ÿ›ก๏ธ Reliable: Built-in retry mechanisms and comprehensive error handling
  • ๐Ÿ“‹ Schema Migration: Automatically migrate table structures
  • โœ… Data Verification: Optional data integrity verification after migration
  • ๐Ÿ”ง Flexible Configuration: CLI arguments or YAML configuration files
  • ๐Ÿ“ Comprehensive Logging: Detailed logs for monitoring and troubleshooting

๐Ÿš€ Quick Start

Installation

# Install from PyPI
pip install clickhouse-migrator

# Or install from source
git clone https://github.com/your-org/clickhouse-migrator.git
cd clickhouse-migrator
pip install -e .

Basic Usage

# Migrate all tables between instances
clickhouse-migrator migrate \\
  clickhouse://user:password@source-host:8123/database \\
  clickhouse://user:password@target-host:8123/database

# Migrate specific tables with custom batch size
clickhouse-migrator migrate \\
  https://user:pass@cloud.clickhouse.com:8443/mydb \\
  http://localhost:8123/mydb \\
  --tables users orders products \\
  --batch-size 50000 \\
  --workers 8

# Migrate with custom query
clickhouse-migrator migrate \\
  clickhouse://user:pass@source:8123/db \\
  clickhouse://user:pass@target:8123/db \\
  --query "SELECT * FROM users WHERE created_at > '2024-01-01'"

๐Ÿ“– Documentation

URI Format

ClickHouse connection URIs support multiple formats:

clickhouse://username:password@host:port/database
clickhouses://username:password@host:port/database  # Secure connection
https://username:password@host:port/database         # HTTPS
http://username:password@host:port/database          # HTTP

Examples:

  • ClickHouse Cloud: https://user:pass@abc123.us-east-1.aws.clickhouse.cloud:8443/mydb
  • Self-hosted: clickhouse://default:password@localhost:8123/mydb
  • Secure self-hosted: clickhouses://user:pass@my-server:8443/production

Command Reference

migrate - Main Migration Command

clickhouse-migrator migrate [OPTIONS] SOURCE_URI TARGET_URI

Options:

  • --tables, -t: Specific tables to migrate (default: all tables)
  • --exclude-tables: Tables to exclude from migration
  • --query, -q: Custom SQL query for data selection
  • --batch-size: Rows per batch (default: 100,000)
  • --workers, -w: Number of parallel workers (default: 4)
  • --no-schema: Skip schema migration
  • --no-data: Skip data migration (schema only)
  • --drop-target: Drop target tables before migration
  • --verify: Verify data integrity after migration (default: true)
  • --resume: Resume from previous checkpoint
  • --checkpoint-file: Custom checkpoint file path
  • --dry-run: Perform dry run without actual migration
  • --config, -c: Use configuration file

list-tables - List Available Tables

clickhouse-migrator list-tables [OPTIONS] URI

Options:

  • --database, -d: Specific database to list tables from

inspect-table - Inspect Table Structure

clickhouse-migrator inspect-table [OPTIONS] URI TABLE

Options:

  • --database, -d: Database name (default: from URI)
  • --limit, -l: Number of sample rows to display (default: 10)

test-connection - Test Connections

clickhouse-migrator test-connection SOURCE_URI TARGET_URI

generate-config - Generate Configuration Template

clickhouse-migrator generate-config [OPTIONS]

Options:

  • --output, -o: Output file path (default: migration-config.yaml)

Configuration File

For complex migrations, use a YAML configuration file:

# migration-config.yaml
source:
  uri: "clickhouse://user:password@source-host:8123/database"
  timeout: 30
  max_retries: 3

target:
  uri: "clickhouse://user:password@target-host:8123/database"
  timeout: 30
  max_retries: 3

tables:
  - name: "users"
    where_clause: "created_at > '2024-01-01'"
    create_table: true
    drop_target: false
  - name: "orders"
    query: "SELECT * FROM orders WHERE status = 'completed'"

exclude_tables:
  - "temp_table"
  - "backup_table"

migrate_schema: true
migrate_data: true
verify_data: true

batch:
  size: 100000
  parallel_workers: 4
  memory_limit_mb: 1024

progress:
  update_interval: 1000
  checkpoint_interval: 50000

resume: false

Then run:

clickhouse-migrator migrate --config migration-config.yaml

๐Ÿ”ง Advanced Usage

Large Dataset Migration

For very large datasets, optimize performance:

clickhouse-migrator migrate \\
  source_uri target_uri \\
  --batch-size 500000 \\
  --workers 16 \\
  --checkpoint-interval 100000

Selective Migration with Custom Query

clickhouse-migrator migrate \\
  source_uri target_uri \\
  --query "SELECT id, name, email FROM users WHERE active = 1 AND created_at > '2024-01-01'"

Resume Interrupted Migration

clickhouse-migrator migrate \\
  source_uri target_uri \\
  --resume \\
  --checkpoint-file my_migration_checkpoint.json

Schema-Only Migration

clickhouse-migrator migrate \\
  source_uri target_uri \\
  --no-data \\
  --tables table1 table2 table3

๐Ÿ—๏ธ Architecture

The tool is built with a modular architecture:

  • Core Engine: ClickHouseMigrator - Main migration orchestrator
  • Connection Manager: ClickHouseConnection - Handles database connections with retry logic
  • Progress Tracker: MigrationProgress - Real-time progress tracking and checkpointing
  • Configuration: MigrationConfig - Flexible configuration management
  • CLI Interface: Rich command-line interface with helpful commands

๐Ÿ” Monitoring and Troubleshooting

Progress Tracking

The tool provides real-time progress information:

Migration Plan:
Source: clickhouse://user@source:8123/mydb
Target: clickhouse://user@target:8123/mydb
Migrate Schema: True
Migrate Data: True
Batch Size: 100,000
Parallel Workers: 4

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Migrating users  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ  100% โ”‚
โ”‚ 1,000,000/1,000,000 rows โ€ข 25.5K rows/s โ€ข 00:00:00       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Migration Progress Summary
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Table       โ”‚ Status   โ”‚ Progress โ”‚ Rows        โ”‚ Rate (rows/s)โ”‚ Time Elapsed โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ users       โ”‚ Completedโ”‚ 100.0%   โ”‚ 1,000,000/1,000,000โ”‚ 25,532      โ”‚ 0:00:39      โ”‚
โ”‚ orders      โ”‚ Running  โ”‚ 45.2%    โ”‚ 452,000/1,000,000  โ”‚ 23,156      โ”‚ 0:00:19      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Logging

Enable detailed logging:

clickhouse-migrator --log-level DEBUG --log-file migration.log migrate ...

Common Issues and Solutions

Connection Issues:

  • Verify URI format and credentials
  • Check network connectivity: clickhouse-migrator test-connection source_uri target_uri
  • Ensure ClickHouse is running and accessible

Performance Issues:

  • Adjust --batch-size based on available memory
  • Increase --workers for better parallelization
  • Use --checkpoint-interval to balance between performance and resume capability

Memory Issues:

  • Reduce --batch-size
  • Decrease number of --workers
  • Monitor system resources during migration

๐Ÿงช Development

Setup Development Environment

git clone https://github.com/your-org/clickhouse-migrator.git
cd clickhouse-migrator

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\\Scripts\\activate

# Install development dependencies
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run with coverage
pytest --cov=clickhouse_migrator --cov-report=html

# Run specific test file
pytest tests/test_connection.py

Code Quality

# Format code
black src/ tests/
isort src/ tests/

# Lint code
flake8 src/ tests/
mypy src/

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run the test suite
  6. Submit a pull request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • clickhouse-connect for ClickHouse Python connectivity
  • Rich for beautiful terminal interfaces
  • Click for command-line interface framework

๐Ÿ”— Links


Made with โค๏ธ for the ClickHouse community

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hawky_clickhouse_migrator-1.0.0.tar.gz (38.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hawky_clickhouse_migrator-1.0.0-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file hawky_clickhouse_migrator-1.0.0.tar.gz.

File metadata

File hashes

Hashes for hawky_clickhouse_migrator-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ec81275fcecdc66e01479e9ab0d1aca9f5ecae59ff19c7b148a087eb4b56db4f
MD5 eb8b0347ff77b2466c7998aca97c8071
BLAKE2b-256 2850956a098ce4f79676762d48b5b8abf13cc6058673025fa7b49b263c49fea3

See more details on using hashes here.

File details

Details for the file hawky_clickhouse_migrator-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for hawky_clickhouse_migrator-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 99de9c1d35df24ba7cd7fe082f737dfab6b9f53533267bc4b1e3815ff97b7404
MD5 00906fd7eb015d4b13be779967d7c692
BLAKE2b-256 223aa23e3bf10e66e8d9b053fe96ec867c5d3903ae5e581b9e93cafa26984774

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page