Skip to main content

CLI tool to incrementally optimize all partitions of ClickHouse tables

Project description

ClickHouse Optimizer

Python 3.12+ License: BSD-3-Clause

A robust CLI tool for incrementally optimizing all partitions of ClickHouse tables. Optimize large tables efficiently by processing partitions sequentially with comprehensive progress tracking and merge monitoring.

🚀 Why ClickHouse Optimizer?

ClickHouse tables can accumulate many small parts over time, degrading query performance. The OPTIMIZE TABLE command typically processes all partitions at once, which can:

  • Overwhelm system resources on large tables
  • Block other operations during optimization
  • Fail on timeout without completing any work
  • Provide no visibility into progress

If you generate a file with a list of partitions to optimize, you can run the OPTIMIZE TABLE command manually. However, this approach is error-prone, failing when the OPTIMIZE TABLE command times out, and can be time-consuming.

This tool solves these problems by:

  • Processing partitions sequentially to manage resource usage
  • Monitoring merge completion with real-time progress bars
  • Graceful timeout handling - continues with next partition if one times out
  • Rich console output with ETA calculations

📦 Installation

pip

pip install clickhouse-optimizer

uvx (no install needed)

uvx clickhouse-optimizer --help

Docker

docker pull gavinmroy/clickhouse-optimizer

🔧 Quick Start

Basic Usage

# Optimize all partitions of a table
clickhouse-optimizer --host ch.example.com --user admin --password secret --database mydb mytable

# Run with verbose logging to see detailed progress
clickhouse-optimizer --verbose --host ch.example.com --user admin --password secret --database mydb mytable

Environment Variables

Set connection parameters via environment variables:

export CLICKHOUSE_HOST=ch.example.com
export CLICKHOUSE_USER=admin
export CLICKHOUSE_PASSWORD=secret
export CLICKHOUSE_DATABASE=mydb

clickhouse-optimizer mytable

Docker

# Using CLI arguments
docker run --rm gavinmroy/clickhouse-optimizer \
  --host ch.example.com --user admin --password secret \
  --database mydb mytable

# Using environment variables
docker run --rm \
  -e CLICKHOUSE_HOST=ch.example.com \
  -e CLICKHOUSE_USER=admin \
  -e CLICKHOUSE_PASSWORD=secret \
  -e CLICKHOUSE_DATABASE=mydb \
  gavinmroy/clickhouse-optimizer mytable

Advanced Options

# Custom timeouts and polling intervals
clickhouse-optimizer \
  --optimize-timeout 7200 \
  --poll-interval 10 \
  mytable

📋 Command Reference

Option Environment Variable Description Default
--host CLICKHOUSE_HOST ClickHouse server hostname Required
--port CLICKHOUSE_PORT ClickHouse server port 9440
--database CLICKHOUSE_DATABASE Database name Required
--user CLICKHOUSE_USER Username for authentication Required
--password CLICKHOUSE_PASSWORD Password for authentication Required
--secure CLICKHOUSE_SECURE Use secure connection False
--verbose CLICKHOUSE_VERBOSE Enable verbose logging False
--optimize-timeout CLICKHOUSE_OPTIMIZE_TIMEOUT Max seconds to wait for merges 43200 (12h)
--poll-interval CLICKHOUSE_POLL_INTERVAL Seconds between status checks 5

🏗️ How It Works

  1. Discovery: Queries system.parts to find all active partitions
  2. Sequential Processing: Optimizes one partition at a time
  3. Merge Monitoring: Tracks merge progress via system.merges
  4. Progress Display: Shows completion status with Rich progress bars
  5. Fault Tolerance: Continues with next partition if one times out

🎯 Use Cases

Large Production Tables

Perfect for tables with hundreds of partitions where full optimization would be too resource-intensive.

Maintenance Windows

Ideal for scheduled optimization during low-traffic periods with predictable progress tracking.

Performance Recovery

Quickly improve query performance on tables with many small parts without system overload.

Safe Operations

Monitor optimization progress with detailed logging and progress tracking.

🛠️ Development

Setup Development Environment

git clone https://github.com/gmr/clickhouse-optimizer
cd clickhouse-optimizer
uv sync

# Install pre-commit hooks
uv run pre-commit install

Code Quality

# Run linting
uv run ruff check

# Format code
uv run ruff format

# Run all pre-commit hooks
uv run pre-commit run --all-files

Testing

# Run tests with coverage
uv run coverage run -m pytest

# Show coverage report
uv run coverage report

# Generate HTML coverage report
uv run coverage html

Build

# Build package distributions
uv build

📄 License

BSD 3-Clause License - see LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with tests
  4. Run the pre-commit hooks
  5. Submit a pull request

📊 Architecture

This tool follows a clean, modular architecture:

  • Settings: Pydantic-based configuration with automatic CLI parsing
  • Optimizer: Core business logic with progress tracking and error handling
  • CLI: Minimal entry point with Rich console formatting

The optimizer discovers partitions, processes them sequentially, and monitors merge completion through ClickHouse system tables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clickhouse_optimizer-1.0.0.tar.gz (45.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clickhouse_optimizer-1.0.0-py3-none-any.whl (9.8 kB view details)

Uploaded Python 3

File details

Details for the file clickhouse_optimizer-1.0.0.tar.gz.

File metadata

  • Download URL: clickhouse_optimizer-1.0.0.tar.gz
  • Upload date:
  • Size: 45.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for clickhouse_optimizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 e1b45153548a479eca2f48da3c55a8917785f93324c7e9f691662ccc505e5852
MD5 d21b1258f3970bf8d47c837cfd4e4a5b
BLAKE2b-256 a03b7a64b6ef4225c4b88a85c587db7c612283813768c7b07fb83ec756fb7194

See more details on using hashes here.

Provenance

The following attestation bundles were made for clickhouse_optimizer-1.0.0.tar.gz:

Publisher: release.yaml on gmr/clickhouse-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file clickhouse_optimizer-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for clickhouse_optimizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbbd4907de8eb5c379fe217a3d565689006f0e926508f2d4d05efc0551c11cdb
MD5 3901c343a9c126d6c68bce7fc37ce3bc
BLAKE2b-256 64440244592be9283bdf68ea2363e4689a8483a54d57554ef9ae5ff76d94ff09

See more details on using hashes here.

Provenance

The following attestation bundles were made for clickhouse_optimizer-1.0.0-py3-none-any.whl:

Publisher: release.yaml on gmr/clickhouse-optimizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page