Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.
Project description
Synch
Introduction
Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.
Features
- Full data etl and real time increment etl.
- Support DDL and DML sync, current support
add column
anddrop column
andchange column
of DDL, and full support of DML also. - Custom configurable items.
- Support kafka and redis as broker.
- Multiple source db sync to ClickHouse at the same time。
- Support ClickHouse
MergeTree
,CollapsingMergeTree
,VersionedCollapsingMergeTree
.
Requirements
- redis, cache mysql binlog file and position and as broker, support redis cluster also.
- kafka, need if you use kafka as broker.
- clickhouse-jdbc-bridge, need if you use postgres and set
auto_full_etl = true
, or execsynch etl
command. - sentry, error reporting, worked if set
dsn
in config.
Install
> pip install synch[all]
Extra pip dependencies:
mysql
, if your source db isMySQL
.postgres
, if your source db isPostgreSQL
.kafka
, if your broker type iskafka
.sentry
, if your usesentry
.all
, install all dependencies.
Example:
> pip install synch[mysql,sentry]
Usage
Config file synch.yaml
synch will read default config from ./synch.yaml
, or you can use synch -c
specify config file.
See full example config in synch.yaml
.
Full data etl
Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew
.
> synch --alias mysql_db etl -h
Usage: synch etl [OPTIONS]
Make etl from source table to ClickHouse.
Options:
--schema TEXT Schema to full etl.
--renew Etl after try to drop the target tables.
-t, --table TEXT Tables to full etl.
-h, --help Show this message and exit.
Full etl from table test.test
:
> synch etl --schema test --tables test
Produce
Listen all MySQL binlog and produce to broker.
> synch --alias mysql_db produce
Consume
Consume message from broker and insert to ClickHouse,and you can skip error rows with --skip-error
. And synch will do full etl at first when set auto_full_etl = true
in config.
> synch --alias mysql_db consume -h
Usage: synch consume [OPTIONS]
Consume from broker and insert into ClickHouse.
Options:
--schema TEXT Schema to consume. [required]
--skip-error Skip error rows.
--last-msg-id TEXT Redis stream last msg id or kafka msg offset, depend on
broker_type in config.
-h, --help Show this message and exit.
Consume schema test
and insert into ClickHouse
:
> synch --alias mysql_db consume --schema test
One consumer consume one schema
ClickHouse Table Engine
Now synch support MergeTree
, CollapsingMergeTree
, VersionedCollapsingMergeTree
.
MergeTree
, default common choices.CollapsingMergeTree
, see detail in CollapsingMergeTree.VersionedCollapsingMergeTree
, see detail in VersionedCollapsingMergeTree.
Use docker-compose(recommended)
Redis Broker, lightweight and for low concurrency
version: "3"
services:
producer:
depends_on:
- redis
image: long2ice/synch
command: synch --alias mysql_db produce
volumes:
- ./synch.yaml:/synch/synch.yaml
# one service consume on schema
consumer.test:
depends_on:
- redis
image: long2ice/synch
command: synch --alias mysql_db consume --schema test
volumes:
- ./synch.yaml:/synch/synch.yaml
redis:
hostname: redis
image: redis:latest
volumes:
- redis
volumes:
redis:
Kafka Broker, for high concurrency
version: "3"
services:
zookeeper:
image: bitnami/zookeeper:3
hostname: zookeeper
environment:
- ALLOW_ANONYMOUS_LOGIN=yes
volumes:
- zookeeper:/bitnami
kafka:
image: bitnami/kafka:2
hostname: kafka
environment:
- KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
- ALLOW_PLAINTEXT_LISTENER=yes
- JMX_PORT=23456
- KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
- KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
depends_on:
- zookeeper
volumes:
- kafka:/bitnami
kafka-manager:
image: hlebalbau/kafka-manager
ports:
- "9000:9000"
environment:
ZK_HOSTS: "zookeeper:2181"
KAFKA_MANAGER_AUTH_ENABLED: "false"
command: -Dpidfile.path=/dev/null
producer:
depends_on:
- redis
- kafka
- zookeeper
image: long2ice/synch
command: synch --alias mysql_db produce
volumes:
- ./synch.yaml:/synch/synch.yaml
# one service consume on schema
consumer.test:
depends_on:
- redis
- kafka
- zookeeper
image: long2ice/synch
command: synch --alias mysql_db consume --schema test
volumes:
- ./synch.yaml:/synch/synch.yaml
redis:
hostname: redis
image: redis:latest
volumes:
- redis:/data
volumes:
redis:
kafka:
zookeeper:
Important
- You need always keep a primary key or unique key without null or composite primary key.
- DDL sync not support postgres.
- Postgres sync is not fully test, be careful use it in production.
QQ Group
Support this project
- Just give a star!
- Join QQ group for communication.
- Donation.
AliPay
WeChat Pay
PayPal
Donate money by paypal to my account long2ice.
ThanksTo
Powerful Python IDE Pycharm from Jetbrains.
License
This project is licensed under the Apache-2.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.