CLI tool to incrementally optimize all partitions of ClickHouse tables
Project description
ClickHouse Optimizer
A robust CLI tool for incrementally optimizing all partitions of ClickHouse tables. Optimize large tables efficiently by processing partitions sequentially with comprehensive progress tracking and merge monitoring.
🚀 Why ClickHouse Optimizer?
ClickHouse tables can accumulate many small parts over time, degrading query performance. The OPTIMIZE TABLE command typically processes all partitions at once, which can:
- Overwhelm system resources on large tables
- Block other operations during optimization
- Fail on timeout without completing any work
- Provide no visibility into progress
If you generate a file with a list of partitions to optimize, you can run the OPTIMIZE TABLE command manually. However, this approach is error-prone, failing when the OPTIMIZE TABLE command times out, and can be time-consuming.
This tool solves these problems by:
- ✅ Processing partitions sequentially to manage resource usage
- ✅ Monitoring merge completion with real-time progress bars
- ✅ Graceful timeout handling - continues with next partition if one times out
- ✅ Rich console output with ETA calculations
📦 Installation
pip
pip install clickhouse-optimizer
uvx (no install needed)
uvx clickhouse-optimizer --help
Docker
docker pull gavinmroy/clickhouse-optimizer
🔧 Quick Start
Basic Usage
# Optimize all partitions of a table
clickhouse-optimizer --host ch.example.com --user admin --password secret --database mydb mytable
# Run with verbose logging to see detailed progress
clickhouse-optimizer --verbose --host ch.example.com --user admin --password secret --database mydb mytable
Environment Variables
Set connection parameters via environment variables:
export CLICKHOUSE_HOST=ch.example.com
export CLICKHOUSE_USER=admin
export CLICKHOUSE_PASSWORD=secret
export CLICKHOUSE_DATABASE=mydb
clickhouse-optimizer mytable
Docker
# Using CLI arguments
docker run --rm gavinmroy/clickhouse-optimizer \
--host ch.example.com --user admin --password secret \
--database mydb mytable
# Using environment variables
docker run --rm \
-e CLICKHOUSE_HOST=ch.example.com \
-e CLICKHOUSE_USER=admin \
-e CLICKHOUSE_PASSWORD=secret \
-e CLICKHOUSE_DATABASE=mydb \
gavinmroy/clickhouse-optimizer mytable
Advanced Options
# Custom timeouts and polling intervals
clickhouse-optimizer \
--optimize-timeout 7200 \
--poll-interval 10 \
mytable
📋 Command Reference
| Option | Environment Variable | Description | Default |
|---|---|---|---|
--host |
CLICKHOUSE_HOST |
ClickHouse server hostname | Required |
--port |
CLICKHOUSE_PORT |
ClickHouse server port | 9440 |
--database |
CLICKHOUSE_DATABASE |
Database name | Required |
--user |
CLICKHOUSE_USER |
Username for authentication | Required |
--password |
CLICKHOUSE_PASSWORD |
Password for authentication | Required |
--secure |
CLICKHOUSE_SECURE |
Use secure connection | False |
--verbose |
CLICKHOUSE_VERBOSE |
Enable verbose logging | False |
--optimize-timeout |
CLICKHOUSE_OPTIMIZE_TIMEOUT |
Max seconds to wait for merges | 43200 (12h) |
--poll-interval |
CLICKHOUSE_POLL_INTERVAL |
Seconds between status checks | 5 |
🏗️ How It Works
- Discovery: Queries
system.partsto find all active partitions - Sequential Processing: Optimizes one partition at a time
- Merge Monitoring: Tracks merge progress via
system.merges - Progress Display: Shows completion status with Rich progress bars
- Fault Tolerance: Continues with next partition if one times out
🎯 Use Cases
Large Production Tables
Perfect for tables with hundreds of partitions where full optimization would be too resource-intensive.
Maintenance Windows
Ideal for scheduled optimization during low-traffic periods with predictable progress tracking.
Performance Recovery
Quickly improve query performance on tables with many small parts without system overload.
Safe Operations
Monitor optimization progress with detailed logging and progress tracking.
🛠️ Development
Setup Development Environment
git clone https://github.com/gmr/clickhouse-optimizer
cd clickhouse-optimizer
uv sync
# Install pre-commit hooks
uv run pre-commit install
Code Quality
# Run linting
uv run ruff check
# Format code
uv run ruff format
# Run all pre-commit hooks
uv run pre-commit run --all-files
Testing
# Run tests with coverage
uv run coverage run -m pytest
# Show coverage report
uv run coverage report
# Generate HTML coverage report
uv run coverage html
Build
# Build package distributions
uv build
📄 License
BSD 3-Clause License - see LICENSE file for details.
🤝 Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes with tests
- Run the pre-commit hooks
- Submit a pull request
📊 Architecture
This tool follows a clean, modular architecture:
- Settings: Pydantic-based configuration with automatic CLI parsing
- Optimizer: Core business logic with progress tracking and error handling
- CLI: Minimal entry point with Rich console formatting
The optimizer discovers partitions, processes them sequentially, and monitors merge completion through ClickHouse system tables.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clickhouse_optimizer-1.0.0.tar.gz.
File metadata
- Download URL: clickhouse_optimizer-1.0.0.tar.gz
- Upload date:
- Size: 45.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e1b45153548a479eca2f48da3c55a8917785f93324c7e9f691662ccc505e5852
|
|
| MD5 |
d21b1258f3970bf8d47c837cfd4e4a5b
|
|
| BLAKE2b-256 |
a03b7a64b6ef4225c4b88a85c587db7c612283813768c7b07fb83ec756fb7194
|
Provenance
The following attestation bundles were made for clickhouse_optimizer-1.0.0.tar.gz:
Publisher:
release.yaml on gmr/clickhouse-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clickhouse_optimizer-1.0.0.tar.gz -
Subject digest:
e1b45153548a479eca2f48da3c55a8917785f93324c7e9f691662ccc505e5852 - Sigstore transparency entry: 931424271
- Sigstore integration time:
-
Permalink:
gmr/clickhouse-optimizer@02c016248e67faf9d1ca0fcfd749fbe1ef982c17 -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/gmr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@02c016248e67faf9d1ca0fcfd749fbe1ef982c17 -
Trigger Event:
release
-
Statement type:
File details
Details for the file clickhouse_optimizer-1.0.0-py3-none-any.whl.
File metadata
- Download URL: clickhouse_optimizer-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbbd4907de8eb5c379fe217a3d565689006f0e926508f2d4d05efc0551c11cdb
|
|
| MD5 |
3901c343a9c126d6c68bce7fc37ce3bc
|
|
| BLAKE2b-256 |
64440244592be9283bdf68ea2363e4689a8483a54d57554ef9ae5ff76d94ff09
|
Provenance
The following attestation bundles were made for clickhouse_optimizer-1.0.0-py3-none-any.whl:
Publisher:
release.yaml on gmr/clickhouse-optimizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
clickhouse_optimizer-1.0.0-py3-none-any.whl -
Subject digest:
fbbd4907de8eb5c379fe217a3d565689006f0e926508f2d4d05efc0551c11cdb - Sigstore transparency entry: 931424356
- Sigstore integration time:
-
Permalink:
gmr/clickhouse-optimizer@02c016248e67faf9d1ca0fcfd749fbe1ef982c17 -
Branch / Tag:
refs/tags/1.0.0 - Owner: https://github.com/gmr
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yaml@02c016248e67faf9d1ca0fcfd749fbe1ef982c17 -
Trigger Event:
release
-
Statement type: