Skip to main content

StarRocks Backup and Restore automation tool

Project description

StarRocks Backup & Restore - CLI Usage Guide

Overview

The StarRocks Backup & Restore tool provides production-grade automation for backup and restore operations.

Important: This tool requires StarRocks 3.5 or later. Earlier versions are not supported due to differences in the SHOW FRONTENDS and SHOW BACKENDS command output formats, which are used for cluster health checks.

📋 View Release Notes & Changelog

Summary

Installation

Option 1: Install from PyPI (Recommended for Production)

We recommend using a virtual environment to ensure proper script availability and dependency isolation:

# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate  # On Linux/Mac
# .venv\Scripts\activate    # On Windows

# Install the package from PyPI
pip install starrocks-br

# Verify the installation
starrocks-br --help

Note: Always activate the virtual environment before using the tool. The starrocks-br command will only be available when the virtual environment is activated.

Option 2: Download Pre-built Standalone Executable

If you prefer not to manage Python environments, you can download a bundled executable that includes the Python runtime and all dependencies.

  1. Download the artifact for your platform from the latest Build Executables workflow run (Artifacts section).

    • starrocks-br-linux-x86_64 → Linux (Intel/AMD)
    • starrocks-br-windows-x86_64 → Windows (Intel/AMD)
    • starrocks-br-macos-arm64 → macOS on Apple Silicon (M1/M2/M3)
    • starrocks-br-macos-x86_64 → macOS on Intel chips
  2. Extract the ZIP file (artifacts are delivered as ZIPs).

  3. Make the file executable (Linux/macOS):

    chmod +x starrocks-br
    
  4. Run it directly:

    ./starrocks-br --help        # Linux/macOS
    .\starrocks-br.exe --help    # Windows (PowerShell)
    
  5. Keep it updated: Download the latest artifact whenever a new release is published. (Future releases will bundle executables automatically.)

Need to build it yourself? Clone the repo and run ./build_executable.sh to recreate the executable locally (see script for details).

Option 3: Using Devbox (Recommended for Development)

Note: This requires cloning the repository first.

Devbox is a reproducible development environment that installs all required tools (Python, dependencies, virtualenv) in one step.

# Clone the repository
git clone https://github.com/deep-bi/starrocks-br
cd starrocks-br

# Install devbox (if not already installed)
curl -fsSL https://get.jetpack.io/devbox | bash

# Start devbox shell - this automatically:
# - Installs Python 3.11 and dependencies
# - Creates a virtual environment (.venv)
# - Installs the package in editable mode
# - Installs development dependencies
devbox shell

# Once inside the devbox shell, you're ready to go:
starrocks-br --help
pytest

Option 4: Manual Development Setup

# Clone the repository
git clone https://github.com/deep-bi/starrocks-br
cd starrocks-br

# Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# Install in editable mode with development dependencies
pip install -e ".[dev]"

# The CLI is now available as: starrocks-br

Quick Start

After installing the CLI (via PyPI, executable download, Devbox, or manual setup), follow these steps:

  1. Activate your virtual environment (if not already active):

    source .venv/bin/activate  # On Linux/Mac
    # .venv\Scripts\activate    # On Windows
    
  2. Verify installation:

    starrocks-br --help
    
  3. Create your config.yaml file (see Configuration section below)

  4. Set your password as an environment variable:

    export STARROCKS_PASSWORD="your_password"
    

    On Windows (PowerShell):

    $env:STARROCKS_PASSWORD="your_password"
    

    On Windows (Command Prompt):

    set STARROCKS_PASSWORD=your_password
    
  5. Initialize the ops schema:

    starrocks-br init --config config.yaml
    
  6. Start using the tool - see Commands section below for details

Configuration

Important: After installing the package, you need to create your own config.yaml file. This file is not included in the package - each user creates it with their own StarRocks connection details. You can place it anywhere and reference it using the --config parameter.

Create a config.yaml file in your working directory (or any location you prefer) with your StarRocks connection details:

host: "127.0.0.1"
port: 9030
user: "root"
database: "your_database"
repository: "your_repo_name"

Password Management

The database password must be provided via the STARROCKS_PASSWORD environment variable. This is a security measure to prevent storing credentials in configuration files.

export STARROCKS_PASSWORD="your_password"

Connecting with TLS/SSL

The tool can make secure connections to StarRocks using TLS. Add an optional tls section to your config.yaml when you need encryption.

Scenario 1: Server Authentication (Most Common)

Use this setup when the client only needs to verify the StarRocks server certificate.

host: "127.0.0.1"
port: 9030
user: "root"
database: "your_database"
repository: "your_repo_name"

tls:
  enabled: true
  ca_cert: "/path/to/ca.pem"
  • enabled: Turns TLS on or off.
  • ca_cert: Certificate Authority file used to validate the server certificate.
  • verify_server_cert (optional, default true): Disable only if you need to skip certificate validation.

Scenario 2: Mutual TLS (mTLS)

Use this when both the client and server must present certificates.

host: "127.0.0.1"
port: 9030
user: "root"
database: "your_database"
repository: "your_repo_name"

tls:
  enabled: true
  ca_cert: "/path/to/ca.pem"
  client_cert: "/path/to/client-cert.pem"
  client_key: "/path/to/client-key.pem"
  • client_cert: Client certificate presented to the server.
  • client_key: Private key paired with the client certificate.

Regardless of the scenario, the connection defaults to modern TLS versions (TLSv1.2, TLSv1.3). Provide a tls_versions list if you need different protocol settings.

Note: The repository must be created in StarRocks using the CREATE REPOSITORY command before running backups. For example:

CREATE REPOSITORY `your_repo_name`
WITH S3
ON LOCATION "s3://your-backup-bucket/backups/"
PROPERTIES (
    "aws.s3.access_key" = "your-access-key",
    "aws.s3.secret_key" = "your-secret-key",
    "aws.s3.endpoint" = "https://s3.amazonaws.com"
);

Commands

Initialize Schema

Before running backups, initialize the ops database and control tables:

starrocks-br init --config config.yaml

What it does:

  • Creates ops database
  • Creates ops.table_inventory: Inventory groups mapping to databases/tables
  • Creates ops.backup_history: Backup operation history
  • Creates ops.restore_history: Restore operation history
  • Creates ops.run_status: Job concurrency control
  • Creates ops.backup_partitions: Partition manifest for each backup (enables intelligent restore)

Next step: Populate ops.table_inventory with your backup groups. For example:

INSERT INTO ops.table_inventory (inventory_group, database_name, table_name)
VALUES
  ('daily_facts', 'your_db', 'fact_sales'),
  ('weekly_dims', 'your_db', 'dim_users'),
  ('weekly_dims', 'your_db', 'dim_products'),
  ('full_db_backup', 'your_db', '*'); -- Wildcard for all tables

Note: If you skip this step, the ops schema will be auto-created on your first backup/restore operation (with a warning).

Backup Commands

Backups are managed through "inventory groups" defined in ops.table_inventory. This provides a flexible way to schedule different backup strategies for different sets of tables.

1. Full Backup

Runs a full backup for all tables within a specified inventory group.

starrocks-br backup full --config config.yaml --group <group_name>

Parameters:

  • --group: The inventory group to back up.

Internal flow:

  1. Load config → verify cluster health → ensure repository exists
  2. Reserve job slot (prevent concurrent backups)
  3. Query ops.table_inventory for all tables in the specified group.
  4. Generate a unique backup label.
  5. Build and execute the BACKUP command for the resolved tables.
  6. Poll SHOW BACKUP until completion and log results.

2. Incremental Backup

Backs up only the partitions that have changed since the last successful full backup for a given inventory group.

starrocks-br backup incremental --config config.yaml --group <group_name>

Parameters:

  • --group: The inventory group to back up.
  • --baseline-backup (Optional): Specify a backup label to use as the baseline instead of the latest full backup.

Internal flow:

  1. Load config → verify cluster health → ensure repository exists
  2. Reserve job slot
  3. Find the latest successful full backup for the group to use as a baseline.
  4. Find recent partitions from information_schema.partitions for tables in the group.
  5. Generate a unique backup label.
  6. Build and execute the BACKUP command for the new partitions.
  7. Poll SHOW BACKUP until completion and log results.

Restore Commands

Intelligent Point-in-Time Restore

Restores data to a specific point in time using intelligent backup chain resolution. This command automatically determines the correct sequence of backups needed for restore.

starrocks-br restore \
  --config config.yaml \
  --target-label my_db_20251016_inc \
  --group daily_facts \
  --rename-suffix _restored

Parameters:

  • --config: Path to config YAML file (required)
  • --target-label: Backup label to restore to (required)
  • --group: Optional inventory group to filter tables to restore (cannot be used with --table)
  • --table: Optional table name to restore (table name only, database comes from config). Cannot be used with --group
  • --rename-suffix: Suffix for temporary tables during restore (default: _restored)

How it works:

  • For full backups: Restores directly from the target backup
  • For incremental backups: Automatically restores the base full backup first, then applies the incremental
  • Safety mechanism: Uses temporary tables with the specified suffix, then performs atomic rename to make restored data live

Three Restore Modes:

  • Disaster Recovery: Restore all tables from a backup (omit both --group and --table parameters)
  • Surgical Restore by Group: Restore only specific table groups (use --group parameter)
  • Single Table Restore: Restore a specific table (use --table parameter). The table name should not include the database prefix - the database comes from the config file.

Table Name Format: When using --table, provide only the table name (e.g., fact_sales), not database.table_name. The database is taken from the database field in your config file. For multiple tables, set up an inventory group and use --group instead.

Purpose of --rename-suffix: The restore process creates temporary tables with the specified suffix (e.g., table_restored) to avoid conflicts with existing tables. Once the restore is complete and verified, the tool performs atomic renames to swap the original tables with the restored data. This ensures data safety and allows for rollback if needed.

Internal flow:

  1. Load config → verify cluster health → ensure repository exists
  2. Find the correct restore sequence (full backup + optional incremental)
  3. Get tables from backup manifest (optionally filtered by group)
  4. Execute restore flow with atomic renames
  5. Log to ops.restore_history

Example Usage Scenarios

Initial Setup

# 1. Initialize ops schema (run once)
starrocks-br init --config config.yaml

# 2. Populate table inventory with your groups (in StarRocks)
INSERT INTO ops.table_inventory (inventory_group, database_name, table_name)
VALUES
  ('daily_incrementals', 'sales_db', 'fact_orders'),
  ('weekly_full', 'sales_db', 'dim_customers'),
  ('weekly_full', 'sales_db', 'dim_products');

Daily Incremental Backup (Mon-Sat)

# Run via cron at 01:00
0 1 * * 1-6 cd /path/to/starrocks-br && source .venv/bin/activate && starrocks-br backup incremental --config config.yaml --group daily_incrementals

Weekly Full Backup (Sunday)

# Run via cron at 01:00 on Sundays
0 1 * * 0 cd /path/to/starrocks-br && source .venv/bin/activate && starrocks-br backup full --config config.yaml --group weekly_full

Disaster Recovery - Point-in-Time Restore

# Restore to a specific backup point (automatically handles full + incremental chain)
starrocks-br restore \
  --config config.yaml \
  --target-label sales_db_20251015_inc \
  --group daily_facts

# Restore all tables from a full backup
starrocks-br restore \
  --config config.yaml \
  --target-label sales_db_20251014_full

# Restore a single table from a backup
starrocks-br restore \
  --config config.yaml \
  --target-label sales_db_20251015_inc \
  --table fact_sales

Error Handling

The CLI automatically handles:

  • Job slot conflicts: Prevents overlapping backups/restores via ops.run_status
  • Label collisions: Automatically appends _r# suffix if label exists
  • Cluster health: Verifies FE/BE status before starting operations
  • Repository validation: Ensures repository exists and is accessible
  • Graceful failures: All errors are logged to history tables with proper status

Monitoring

All operations are logged to:

  • ops.backup_history: Tracks all backup attempts with status, timestamps, and error messages
  • ops.restore_history: Tracks all restore operations with verification checksums
  • ops.run_status: Tracks active jobs to prevent conflicts

Query examples:

-- Check recent backup status
SELECT label, backup_type, status, started_at, finished_at
FROM ops.backup_history
ORDER BY started_at DESC
LIMIT 10;

-- Check for failed backups
SELECT label, backup_type, error_message, started_at
FROM ops.backup_history
WHERE status = 'FAILED'
ORDER BY started_at DESC;

-- Check active jobs
SELECT scope, label, state, started_at
FROM ops.run_status
WHERE state = 'ACTIVE';

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

starrocks_br-0.3.0.tar.gz (58.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

starrocks_br-0.3.0-py3-none-any.whl (33.1 kB view details)

Uploaded Python 3

File details

Details for the file starrocks_br-0.3.0.tar.gz.

File metadata

  • Download URL: starrocks_br-0.3.0.tar.gz
  • Upload date:
  • Size: 58.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for starrocks_br-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f5263d36f0ec82bfe8fcc635b97311dfec78cfcdb15a8109e257d945adec3ef6
MD5 85acb2b86f5569131831a6c8893d98db
BLAKE2b-256 5f35194a0632e1e8ba220634f5ea4b751f0048d6ef17a0225526fa3756cdc593

See more details on using hashes here.

File details

Details for the file starrocks_br-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: starrocks_br-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 33.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for starrocks_br-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5cae2e7c64bb31a33b65ea6a3a743b6ab9645971d217467cff0494f336a4fbcd
MD5 6121e383aa693b84e7ae50e5d4d396a4
BLAKE2b-256 e99dee8e26c6b27e8ecc2a596ce8d34c7f8301c626ea2d6c24cff6309e0a8dbd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page