A reliable CLI tool for migrating data between ClickHouse instances
Project description
ClickHouse Migrator ๐
A reliable, high-performance CLI tool for migrating data between ClickHouse instances with support for ClickHouse Cloud and self-hosted deployments.
โจ Features
- ๐ Bidirectional Migration: Migrate between ClickHouse Cloud โ Self-hosted, or between any ClickHouse instances
- ๐ฏ Custom Query Support: Use custom SQL queries to selectively migrate data
- ๐ Real-time Progress Tracking: Beautiful progress bars with speed, ETA, and statistics
- โก High Performance: Parallel processing and configurable batch sizes for optimal speed
- ๐ Resume Capability: Resume interrupted migrations from checkpoints
- ๐ก๏ธ Reliable: Built-in retry mechanisms and comprehensive error handling
- ๐ Schema Migration: Automatically migrate table structures
- โ Data Verification: Optional data integrity verification after migration
- ๐ง Flexible Configuration: CLI arguments or YAML configuration files
- ๐ Comprehensive Logging: Detailed logs for monitoring and troubleshooting
๐ Quick Start
Installation
# Install from PyPI
pip install clickhouse-migrator
# Or install from source
git clone https://github.com/your-org/clickhouse-migrator.git
cd clickhouse-migrator
pip install -e .
Basic Usage
# Migrate all tables between instances
clickhouse-migrator migrate \\
clickhouse://user:password@source-host:8123/database \\
clickhouse://user:password@target-host:8123/database
# Migrate specific tables with custom batch size
clickhouse-migrator migrate \\
https://user:pass@cloud.clickhouse.com:8443/mydb \\
http://localhost:8123/mydb \\
--tables users orders products \\
--batch-size 50000 \\
--workers 8
# Migrate with custom query
clickhouse-migrator migrate \\
clickhouse://user:pass@source:8123/db \\
clickhouse://user:pass@target:8123/db \\
--query "SELECT * FROM users WHERE created_at > '2024-01-01'"
๐ Documentation
URI Format
ClickHouse connection URIs support multiple formats:
clickhouse://username:password@host:port/database
clickhouses://username:password@host:port/database # Secure connection
https://username:password@host:port/database # HTTPS
http://username:password@host:port/database # HTTP
Examples:
- ClickHouse Cloud:
https://user:pass@abc123.us-east-1.aws.clickhouse.cloud:8443/mydb - Self-hosted:
clickhouse://default:password@localhost:8123/mydb - Secure self-hosted:
clickhouses://user:pass@my-server:8443/production
Command Reference
migrate - Main Migration Command
clickhouse-migrator migrate [OPTIONS] SOURCE_URI TARGET_URI
Options:
--tables, -t: Specific tables to migrate (default: all tables)--exclude-tables: Tables to exclude from migration--query, -q: Custom SQL query for data selection--batch-size: Rows per batch (default: 100,000)--workers, -w: Number of parallel workers (default: 4)--no-schema: Skip schema migration--no-data: Skip data migration (schema only)--drop-target: Drop target tables before migration--verify: Verify data integrity after migration (default: true)--resume: Resume from previous checkpoint--checkpoint-file: Custom checkpoint file path--dry-run: Perform dry run without actual migration--config, -c: Use configuration file
list-tables - List Available Tables
clickhouse-migrator list-tables [OPTIONS] URI
Options:
--database, -d: Specific database to list tables from
inspect-table - Inspect Table Structure
clickhouse-migrator inspect-table [OPTIONS] URI TABLE
Options:
--database, -d: Database name (default: from URI)--limit, -l: Number of sample rows to display (default: 10)
test-connection - Test Connections
clickhouse-migrator test-connection SOURCE_URI TARGET_URI
generate-config - Generate Configuration Template
clickhouse-migrator generate-config [OPTIONS]
Options:
--output, -o: Output file path (default: migration-config.yaml)
Configuration File
For complex migrations, use a YAML configuration file:
# migration-config.yaml
source:
uri: "clickhouse://user:password@source-host:8123/database"
timeout: 30
max_retries: 3
target:
uri: "clickhouse://user:password@target-host:8123/database"
timeout: 30
max_retries: 3
tables:
- name: "users"
where_clause: "created_at > '2024-01-01'"
create_table: true
drop_target: false
- name: "orders"
query: "SELECT * FROM orders WHERE status = 'completed'"
exclude_tables:
- "temp_table"
- "backup_table"
migrate_schema: true
migrate_data: true
verify_data: true
batch:
size: 100000
parallel_workers: 4
memory_limit_mb: 1024
progress:
update_interval: 1000
checkpoint_interval: 50000
resume: false
Then run:
clickhouse-migrator migrate --config migration-config.yaml
๐ง Advanced Usage
Large Dataset Migration
For very large datasets, optimize performance:
clickhouse-migrator migrate \\
source_uri target_uri \\
--batch-size 500000 \\
--workers 16 \\
--checkpoint-interval 100000
Selective Migration with Custom Query
clickhouse-migrator migrate \\
source_uri target_uri \\
--query "SELECT id, name, email FROM users WHERE active = 1 AND created_at > '2024-01-01'"
Resume Interrupted Migration
clickhouse-migrator migrate \\
source_uri target_uri \\
--resume \\
--checkpoint-file my_migration_checkpoint.json
Schema-Only Migration
clickhouse-migrator migrate \\
source_uri target_uri \\
--no-data \\
--tables table1 table2 table3
๐๏ธ Architecture
The tool is built with a modular architecture:
- Core Engine:
ClickHouseMigrator- Main migration orchestrator - Connection Manager:
ClickHouseConnection- Handles database connections with retry logic - Progress Tracker:
MigrationProgress- Real-time progress tracking and checkpointing - Configuration:
MigrationConfig- Flexible configuration management - CLI Interface: Rich command-line interface with helpful commands
๐ Monitoring and Troubleshooting
Progress Tracking
The tool provides real-time progress information:
Migration Plan:
Source: clickhouse://user@source:8123/mydb
Target: clickhouse://user@target:8123/mydb
Migrate Schema: True
Migrate Data: True
Batch Size: 100,000
Parallel Workers: 4
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Migrating users โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 100% โ
โ 1,000,000/1,000,000 rows โข 25.5K rows/s โข 00:00:00 โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Migration Progress Summary
โโโโโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Table โ Status โ Progress โ Rows โ Rate (rows/s)โ Time Elapsed โ
โโโโโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโค
โ users โ Completedโ 100.0% โ 1,000,000/1,000,000โ 25,532 โ 0:00:39 โ
โ orders โ Running โ 45.2% โ 452,000/1,000,000 โ 23,156 โ 0:00:19 โ
โโโโโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
Logging
Enable detailed logging:
clickhouse-migrator --log-level DEBUG --log-file migration.log migrate ...
Common Issues and Solutions
Connection Issues:
- Verify URI format and credentials
- Check network connectivity:
clickhouse-migrator test-connection source_uri target_uri - Ensure ClickHouse is running and accessible
Performance Issues:
- Adjust
--batch-sizebased on available memory - Increase
--workersfor better parallelization - Use
--checkpoint-intervalto balance between performance and resume capability
Memory Issues:
- Reduce
--batch-size - Decrease number of
--workers - Monitor system resources during migration
๐งช Development
Setup Development Environment
git clone https://github.com/your-org/clickhouse-migrator.git
cd clickhouse-migrator
# Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\\Scripts\\activate
# Install development dependencies
pip install -e ".[dev]"
# Install pre-commit hooks
pre-commit install
Running Tests
# Run all tests
pytest
# Run with coverage
pytest --cov=clickhouse_migrator --cov-report=html
# Run specific test file
pytest tests/test_connection.py
Code Quality
# Format code
black src/ tests/
isort src/ tests/
# Lint code
flake8 src/ tests/
mypy src/
๐ค Contributing
We welcome contributions! Please see our Contributing Guide for details.
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Run the test suite
- Submit a pull request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- clickhouse-connect for ClickHouse Python connectivity
- Rich for beautiful terminal interfaces
- Click for command-line interface framework
๐ Links
Made with โค๏ธ for the ClickHouse community
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hawky_clickhouse_migrator-1.0.0.tar.gz.
File metadata
- Download URL: hawky_clickhouse_migrator-1.0.0.tar.gz
- Upload date:
- Size: 38.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec81275fcecdc66e01479e9ab0d1aca9f5ecae59ff19c7b148a087eb4b56db4f
|
|
| MD5 |
eb8b0347ff77b2466c7998aca97c8071
|
|
| BLAKE2b-256 |
2850956a098ce4f79676762d48b5b8abf13cc6058673025fa7b49b263c49fea3
|
File details
Details for the file hawky_clickhouse_migrator-1.0.0-py3-none-any.whl.
File metadata
- Download URL: hawky_clickhouse_migrator-1.0.0-py3-none-any.whl
- Upload date:
- Size: 28.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99de9c1d35df24ba7cd7fe082f737dfab6b9f53533267bc4b1e3815ff97b7404
|
|
| MD5 |
00906fd7eb015d4b13be779967d7c692
|
|
| BLAKE2b-256 |
223aa23e3bf10e66e8d9b053fe96ec867c5d3903ae5e581b9e93cafa26984774
|