Skip to main content

Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.

Project description

Synch

pypi docker license workflows workflows

中文文档

Introduction

Sync data from other DB to ClickHouse, current support postgres and mysql, and support full and increment ETL.

synch

Features

  • Full data etl and real time increment etl.
  • Support DDL and DML sync, current support add column and drop column and change column of DDL, and full support of DML also.
  • Email error report.
  • Support kafka and redis as broker.
  • Multiple source db sync to ClickHouse at the same time。
  • Support ClickHouse MergeTree,CollapsingMergeTree,VersionedCollapsingMergeTree,ReplacingMergeTree.
  • Support ClickHouse cluster.

Requirements

  • Python >= 3.7
  • redis, cache mysql binlog file and position and as broker, support redis cluster also.
  • kafka, need if you use kafka as broker.
  • clickhouse-jdbc-bridge, need if you use postgres and set auto_full_etl = true, or exec synch etl command.
  • sentry, error reporting, worked if set dsn in config.

Install

> pip install synch

Usage

Config file synch.yaml

synch will read default config from ./synch.yaml, or you can use synch -c specify config file.

See full example config in synch.yaml.

Full data etl

Maybe you need make full data etl before continuous sync data from MySQL to ClickHouse or redo data etl with --renew.

> synch --alias mysql_db etl -h

Usage: synch etl [OPTIONS]

  Make etl from source table to ClickHouse.

Options:
  --schema TEXT     Schema to full etl.
  --renew           Etl after try to drop the target tables.
  -t, --table TEXT  Tables to full etl.
  -h, --help        Show this message and exit.

Full etl from table test.test:

> synch etl --schema test --table test --table test2

Produce

Listen all MySQL binlog and produce to broker.

> synch --alias mysql_db produce

Consume

Consume message from broker and insert to ClickHouse,and you can skip error rows with --skip-error. And synch will do full etl at first when set auto_full_etl = true in config.

> synch --alias mysql_db consume -h

Usage: synch consume [OPTIONS]

  Consume from broker and insert into ClickHouse.

Options:
  --schema TEXT       Schema to consume.  [required]
  --skip-error        Skip error rows.
  --last-msg-id TEXT  Redis stream last msg id or kafka msg offset, depend on
                      broker_type in config.

  -h, --help          Show this message and exit.

Consume schema test and insert into ClickHouse:

> synch --alias mysql_db consume --schema test

Monitor

Set true to core.monitoring, which will create database synch in ClickHouse automatically and insert monitoring data.

Table struct:

create table if not exists synch.log
(
    alias      String,
    schema     String,
    table      String,
    num        int,
    type       int, -- 1:producer, 2:consumer
    created_at DateTime
)
    engine = MergeTree partition by toYYYYMM(created_at) order by created_at;

ClickHouse Table Engine

Now synch support MergeTree, CollapsingMergeTree, VersionedCollapsingMergeTree, ReplacingMergeTree.

Use docker-compose(recommended)

Redis Broker, lightweight and for low concurrency
version: "3"
services:
  producer:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis
volumes:
  redis:
Kafka Broker, for high concurrency
version: "3"
services:
  zookeeper:
    image: bitnami/zookeeper:3
    hostname: zookeeper
    environment:
      - ALLOW_ANONYMOUS_LOGIN=yes
    volumes:
      - zookeeper:/bitnami
  kafka:
    image: bitnami/kafka:2
    hostname: kafka
    environment:
      - KAFKA_CFG_ZOOKEEPER_CONNECT=zookeeper:2181
      - ALLOW_PLAINTEXT_LISTENER=yes
      - JMX_PORT=23456
      - KAFKA_CFG_AUTO_CREATE_TOPICS_ENABLE=true
      - KAFKA_ADVERTISED_LISTENERS=PLAINTEXT://kafka:9092
    depends_on:
      - zookeeper
    volumes:
      - kafka:/bitnami
  kafka-manager:
    image: hlebalbau/kafka-manager
    ports:
      - "9000:9000"
    environment:
      ZK_HOSTS: "zookeeper:2181"
      KAFKA_MANAGER_AUTH_ENABLED: "false"
    command: -Dpidfile.path=/dev/null
  producer:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db produce
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  # one service consume on schema
  consumer.test:
    depends_on:
      - redis
      - kafka
      - zookeeper
    image: long2ice/synch
    command: synch --alias mysql_db consume --schema test
    volumes:
      - ./synch.yaml:/synch/synch.yaml
  redis:
    hostname: redis
    image: redis:latest
    volumes:
      - redis:/data
volumes:
  redis:
  kafka:
  zookeeper:

Important

  • You need always keep a primary key or unique key without null or composite primary key.
  • DDL sync not support postgres.
  • Postgres sync is not fully test, be careful use it in production.

Support this project

AliPay WeChatPay PayPal
PayPal to my account long2ice.

ThanksTo

Powerful Python IDE Pycharm from Jetbrains.

jetbrains

License

This project is licensed under the Apache-2.0 License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synch-0.7.1.tar.gz (26.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

synch-0.7.1-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file synch-0.7.1.tar.gz.

File metadata

  • Download URL: synch-0.7.1.tar.gz
  • Upload date:
  • Size: 26.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for synch-0.7.1.tar.gz
Algorithm Hash digest
SHA256 a27fddb0dcd8ba256f528bd002feaeba250520cfe241a1b1bf7951512c21e502
MD5 3bf74fcc52f17fc2d737f8dc19fe65d6
BLAKE2b-256 c79f1d4be6b8489e908a232c0718098b303f88c525f6f184d8b8e85603a46d14

See more details on using hashes here.

File details

Details for the file synch-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: synch-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.8.6

File hashes

Hashes for synch-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 116b9246abce671ef0e221899e84ae96b17c55f45337ff95d23a63f18f20d934
MD5 48df0bc0378c37aed5e61c78dca50189
BLAKE2b-256 5b3a9f9d2e1b88a1677d3d69c57b10f169f52bbc671dde2210fef3d918fd5741

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page