Skip to main content

Python CLI tool for ClickHouse schema migrations with distributed locking and cluster support.

Project description

PyClickHouseMigrator

PyClickHouseMigrator

CI PyPI Python codecov Downloads

SQL-first ClickHouse migrations for Python teams.

PyClickHouseMigrator is a small, predictable migration runner for ClickHouse. It keeps schema changes in plain .sql files, applies them in order, stores migration state inside ClickHouse, validates checksums, supports rollback SQL, and fits naturally into CI/CD.

No ORM. No schema diff engine. No framework. Just a clean migration flow for teams that want their ClickHouse DDL to live in Git.

Highlights

  • Plain SQL migrations — migration files are .sql, not Python modules.
  • Explicit statement blocks — each -- @stmt block is executed as one ClickHouse query; the migrator does not split SQL by semicolons.
  • Rollback SQL-- migrator:down stores the rollback SQL used by migrator rollback.
  • Checksum validation — detects edited or missing applied migration files.
  • Dry-run mode — preview pending migrations without writing migration state.
  • Baseline — adopt an existing database without executing historical migrations.
  • Advisory locking — protects common CI/CD concurrency cases.
  • Cluster-aware service tables — migration metadata and lock tables can be created with ON CLUSTER and replicated engines.
  • Small dependency set — depends on click and clickhouse-driver.

Install

pip install py-clickhouse-migrator

With uv:

uv add py-clickhouse-migrator

The CLI command is:

migrator --help

Quick start

Set the ClickHouse connection URL. The database in the URL must already exist.

export CLICKHOUSE_MIGRATE_URL=clickhouse://default@localhost:9000/mydb

Create a migrations directory and a first migration:

migrator init
migrator new create_users_table

Edit the generated file in ./db/migrations:

-- migrator:up
-- @stmt
CREATE TABLE IF NOT EXISTS users
(
    id UInt64,
    name String,
    created_at DateTime DEFAULT now()
)
ENGINE = MergeTree
ORDER BY id

-- migrator:down
-- @stmt
DROP TABLE IF EXISTS users

Preview what will run:

migrator up --dry-run

Apply pending migrations:

migrator up

Check status:

migrator show

Rollback the last applied migration:

migrator rollback

init and new work offline. Commands that read or write migration state require a ClickHouse connection.

Migration format

A migration is a .sql file with two required sections:

-- migrator:up
-- @stmt
-- SQL to apply the migration

-- migrator:down
-- @stmt
-- SQL to roll it back

Rules:

  • each file must contain exactly one -- migrator:up section and exactly one -- migrator:down section;
  • -- migrator:up must appear before -- migrator:down;
  • SQL must be placed inside -- @stmt blocks;
  • the up section must contain at least one non-empty statement block;
  • the down section may be empty;
  • each -- @stmt block is sent to ClickHouse as one query;
  • the migrator does not split SQL by ;.

This is intentional. It avoids fragile semicolon splitting and makes multi-statement migrations explicit.

Good:

-- migrator:up
-- @stmt
CREATE TABLE IF NOT EXISTS events
(
    id UInt64,
    message String
)
ENGINE = MergeTree
ORDER BY id

-- @stmt
ALTER TABLE events ADD COLUMN IF NOT EXISTS created_at DateTime DEFAULT now()

-- migrator:down
-- @stmt
ALTER TABLE events DROP COLUMN IF EXISTS created_at

-- @stmt
DROP TABLE IF EXISTS events

Avoid putting multiple ClickHouse queries into one block:

-- bad
-- @stmt
CREATE TABLE a (id UInt64) ENGINE = MergeTree ORDER BY id;
CREATE TABLE b (id UInt64) ENGINE = MergeTree ORDER BY id;

Use separate blocks instead:

-- good
-- @stmt
CREATE TABLE a (id UInt64) ENGINE = MergeTree ORDER BY id

-- @stmt
CREATE TABLE b (id UInt64) ENGINE = MergeTree ORDER BY id

See Migration format for more examples.

Commands

init

Create the migrations directory. Default: ./db/migrations.

migrator init
migrator --path ./migrations init

Works offline.

new

Create a timestamped .sql migration file.

migrator new create_users_table

Generated filename format:

YYYYMMDDHHmmss_create_users_table.sql

The name suffix is optional, but recommended. Only letters, digits, and underscores are allowed.

Works offline.

up

Apply pending migrations in filename order.

migrator up          # apply all pending migrations
migrator up 3        # apply next 3 pending migrations
migrator up --dry-run
Option Default Description
N all Optional positional limit: number of pending migrations to apply.
--lock / --no-lock --lock Enable or disable the migration lock.
--lock-ttl 600 Lock TTL in seconds.
--lock-retry 3 Lock acquire retry attempts.
--dry-run off Print pending migration SQL without executing it.
--validate / --no-validate --validate Enable or disable preflight validation with EXPLAIN AST.
--allow-dirty off Skip checksum mismatch failures for this run.

Example output:

20260421140000_create_users_table.sql applied [✔]
20260421143000_add_events_table.sql applied [✔]

rollback

Rollback applied migrations in reverse order.

migrator rollback        # rollback last applied migration
migrator rollback 3      # rollback last 3 applied migrations
migrator rollback --dry-run
Option Default Description
N 1 Optional positional limit: number of migrations to rollback.
--lock / --no-lock --lock Enable or disable the migration lock.
--lock-ttl 600 Lock TTL in seconds.
--lock-retry 3 Lock acquire retry attempts.
--dry-run off Print rollback SQL without executing it.
--validate / --no-validate --validate Enable or disable preflight validation with EXPLAIN AST.

Rollback uses the down SQL stored in db_migrations at the time the migration was applied, not the current file content.

show

Display applied migrations, pending migrations, HEAD, baseline markers, and integrity warnings.

migrator show
migrator show --all
Option Default Description
--all off Show all applied migrations. By default, only the latest 5 applied migrations are shown.

Example:

Applied:
  [X] 20260421143000_add_events_table.sql (HEAD)
  [X] 20260421140000_create_users_table.sql

Pending:
  [ ] 20260421150000_add_status_column.sql

Applied: 2 | Pending: 1

Possible applied migration markers:

Marker Meaning
HEAD Latest applied migration row.
baseline Migration was recorded by baseline, not executed.
modified Applied migration file exists but its checksum no longer matches.
missing Applied migration file is missing locally.

baseline

Adopt an existing ClickHouse database without replaying historical DDL.

migrator baseline

Use this when your database schema already exists and you have .sql migration files that represent that existing schema. Point the migrator at that directory with --path or CLICKHOUSE_MIGRATE_DIR, or put the files in the default ./db/migrations directory. Then run migrator baseline once and use migrator up for future migrations.

Baseline behavior:

  • creates the db_migrations service table if it does not exist;
  • requires db_migrations to have no rows;
  • marks all current .sql files in the migrations directory as already applied (baseline rows);
  • does not execute SQL;
  • does not validate that the ClickHouse schema matches the files;
  • baseline rows are not selected by rollback;
  • baseline rows are excluded from checksum validation.

See Baseline existing databases.

repair

Update stored checksums to match current migration files.

migrator repair

Use this only after intentionally editing already-applied migration file(s) and confirming that the database state is still consistent with those edits. migrator show and migrator up report that applied migration SQL changed; repair updates the stored checksum to accept the current file content in future checks. It does not execute SQL, does not modify your application schema, and skips missing files.

lock-info

Show active migration lock information.

migrator lock-info

Example:

Locked by: runner-01:1234:abc123ef
Locked at: 2026-04-21 14:30:00
Expires at: 2026-04-21 14:40:00

force-unlock

Manually release a stuck lock.

migrator force-unlock

Use this when a deployment process crashed and the lock did not get released. Locks also expire automatically after their TTL.

Configuration

Global options can be provided through CLI flags or environment variables.

CLI option Environment variable Default Description
--url CLICKHOUSE_MIGRATE_URL ClickHouse connection URL. Required for DB commands.
--path CLICKHOUSE_MIGRATE_DIR ./db/migrations Migrations directory.
--cluster CLICKHOUSE_MIGRATE_CLUSTER ClickHouse cluster name for service table DDL.
--connect-retries CLICKHOUSE_MIGRATE_CONNECT_RETRIES 0 Connection retry attempts.
--connect-retries-interval CLICKHOUSE_MIGRATE_CONNECT_RETRIES_INTERVAL 1 Seconds between connection retries.
--send-receive-timeout CLICKHOUSE_MIGRATE_SEND_RECEIVE_TIMEOUT 600 ClickHouse client send/receive timeout in seconds.
-v, --verbose off Enable DEBUG logging.
-q, --quiet off Suppress INFO/WARNING logs; command output such as dry-run SQL is still printed.

Connection URL example:

clickhouse://user:password@host:9000/database

For TLS/secure connections, use connection parameters supported by clickhouse-driver URLs, for example:

clickhouse://user:password@host:9440/database?secure=True

Docker

docker pull maksimburtsev/py-clickhouse-migrator:latest

Run migrations:

docker run --rm \
  -v "$PWD/db/migrations:/migrations" \
  -e CLICKHOUSE_MIGRATE_URL=clickhouse://default@clickhouse:9000/mydb \
  maksimburtsev/py-clickhouse-migrator:latest \
  up

Inside the Docker image, the default migrations directory is /migrations.

See Docker usage.

CI/CD

In CI/CD, the usual deployment step is intentionally simple:

migrator up

GitHub Actions example:

- name: Run ClickHouse migrations
  run: migrator up
  env:
    CLICKHOUSE_MIGRATE_URL: ${{ secrets.CLICKHOUSE_MIGRATE_URL }}

Recommended deployment pattern:

  1. build and test application code;
  2. run migrator up once per deployment;
  3. deploy application code that depends on the new schema.

The CLI exits with 0 on success and 1 for handled migration errors such as invalid migrations, connection failures, checksum mismatches, missing databases, and lock errors.

See CI/CD usage.

Checksum validation

When a migration is applied, PyClickHouseMigrator stores a SHA-256 checksum in db_migrations.

The checksum is computed from the up and down statement blocks extracted by the migrator, not from the migration file as a whole. This makes the checksum tied to the SQL blocks the tool will send to ClickHouse.

On migrator up, applied migration checksums are compared with the current local files. If an applied migration was modified or deleted, the command fails unless --allow-dirty is used.

Useful commands:

migrator show          # display modified/missing markers
migrator up            # fail fast on checksum mismatch
migrator up --allow-dirty
migrator repair        # update stored checksums after intentional edits

Preflight validation

By default, up, rollback, and their dry-run variants validate statements with EXPLAIN AST before execution.

migrator up --no-validate
migrator rollback --no-validate

Validation is best-effort. It catches many syntax and parse problems early, but it is not a guarantee that execution will succeed and it is not a production-safety analyzer.

Locking

up, rollback, and baseline use an advisory migration lock by default.

The lock is stored in a ClickHouse service table named _migrations_lock. It has a TTL, a lock owner identity, retry behavior, and manual recovery commands.

migrator up --lock-ttl 900
migrator up --lock-retry 10
migrator up --no-lock
migrator lock-info
migrator force-unlock

The lock is meant to protect common deployment races, for example two CI jobs starting at the same time. It is still best practice to run migrations from a single deployment job or Kubernetes Job.

Cluster mode

Set a cluster name when you want the migrator's own service tables to be created across a ClickHouse cluster:

export CLICKHOUSE_MIGRATE_CLUSTER=my_cluster
migrator up

When cluster mode is enabled:

  • db_migrations is created with ON CLUSTER and a replicated engine;
  • _migrations_lock is created with ON CLUSTER and a replicated replacing engine;
  • service table writes use cluster consistency settings;
  • your migration SQL is executed exactly as written.

PyClickHouseMigrator does not inject ON CLUSTER into user migrations. If your ClickHouse DDL must run on the whole cluster, write ON CLUSTER in the migration yourself.

See Cluster mode.

Python API

from py_clickhouse_migrator import Migrator

migrator = Migrator(
    database_url="clickhouse://default@localhost:9000/mydb",
    migrations_dir="./db/migrations",
)

migrator.up()

See Python API.

Known limitations

  • ClickHouse DDL is not transactional. A multi-statement migration can partially apply if a later statement fails.
  • The advisory lock is best-effort. It reduces common concurrency problems, but it is not a substitute for a single well-defined migration job.
  • The target database must exist before the migrator runs.
  • Baseline does not compare migration files with the existing database schema.
  • Preflight validation is best-effort and can be disabled with --no-validate.
  • Each -- @stmt block must contain one ClickHouse query.

See Known limitations.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_clickhouse_migrator-2.0.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_clickhouse_migrator-2.0.0-py3-none-any.whl (21.3 kB view details)

Uploaded Python 3

File details

Details for the file py_clickhouse_migrator-2.0.0.tar.gz.

File metadata

  • Download URL: py_clickhouse_migrator-2.0.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for py_clickhouse_migrator-2.0.0.tar.gz
Algorithm Hash digest
SHA256 ca11eb5d657146b0574e7889ce4b879e450de87c1ae56e7c6dd29584cea7da6e
MD5 f2bf10c9ea03e0c8e476137eec7efbdc
BLAKE2b-256 2f973d189b1e0e56ccf98e66483e4ac33e80e886a6f1dc952db72a6fb677f4b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_clickhouse_migrator-2.0.0.tar.gz:

Publisher: release.yml on Maksim-Burtsev/PyClickHouseMigrator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file py_clickhouse_migrator-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for py_clickhouse_migrator-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 47d5cc5eb9cd0b7c7760de5fceab48c6d2ada0fea3c3d454444948f8395f0ed7
MD5 bee3254db151edde7fcbf294fb9b9479
BLAKE2b-256 87b408bde17bac4d8e437851bd4ce7b7b13af0cdf980e87bfdf8f52437b01b1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for py_clickhouse_migrator-2.0.0-py3-none-any.whl:

Publisher: release.yml on Maksim-Burtsev/PyClickHouseMigrator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page