Skip to main content

CLI tool to clean up your BigQuery old and unused datasets and tables.

Project description

🧹 BigQuery Cleaner

Python 3.10+ PyPI version uv License: MIT

BigQuery Cleaner is a powerful CLI tool designed to help you declutter your Google BigQuery environment. It identifies tables that haven't been queried recently and provides safe mechanisms to rename or prepare them for deletion.


🚀 Quick Start

Get up and running in seconds:

# 1. Install via uv (recommended)
uv tool install bigquery-cleaner

# Or via pip
pip install bigquery-cleaner

# 2. Find unused tables (older than 30 days and not queried)
bigquery-cleaner list-unused-tables --project your-gcp-project --all-datasets --days 30

✨ Features

  • 🔍 Unused Table Detection: Scans INFORMATION_SCHEMA.JOBS to find tables that aren't being used.
  • 📊 Storage Insight: Displays table sizes in GB and provides per-dataset and grand total summaries.
  • 📂 Multi-Dataset Support: Target specific datasets, exclude others, or scan your entire project.
  • 🏷️ Safe Renaming: Dry-run mode allows you to see what would happen before making changes.
  • 🔄 Easy Revert: Renamed a table by mistake? Revert it easily with the revert-renamed-tables command.
  • 🗑️ Permanent Cleanup: Use delete-tables to remove suffixed tables once you've confirmed they are no longer needed.
  • 🧹 Dataset Cleanup: Remove empty datasets that no longer contain any tables or views using delete-empty-datasets.
  • ⚙️ Configurable: Use a cleaner.toml file to save your project defaults and lookback windows.
  • Built with Speed: Powered by uv, Typer, and Rich for a beautiful, fast terminal experience.

📋 Prerequisites

  • Python 3.10+
  • uv package manager installed.
  • Google Cloud Credentials: Configured via Application Default Credentials (ADC).
    gcloud auth application-default login
    

🛠️ Installation

Using uv (Recommended)

uv tool install bigquery-cleaner

Using pip

pip install bigquery-cleaner

From Source (Development)

# Clone the repository
git clone https://github.com/elvainch/bigquery-cleaner.git
cd bigquery-cleaner

# Sync dependencies and install the tool
uv sync
uv tool install .

📖 Usage Guide

Help Command

Every command and sub-command supports the --help flag for detailed information on available options.

Example: bigquery-cleaner list-unused-tables --help

Run bigquery-cleaner --help to see all available commands.

Connectivity Check

Ensure your credentials and project access are working:

bigquery-cleaner ping --project YOUR_PROJECT

Exploration

List available datasets and tables:

# List all datasets
bigquery-cleaner datasets --project YOUR_PROJECT

# List tables in specific datasets
bigquery-cleaner tables --datasets dataset1,dataset2 --project YOUR_PROJECT

Identifying Waste

The core functionality to find old, unreferenced tables:

# List unused tables across all datasets
bigquery-cleaner list-unused-tables --all-datasets --days 90

Cleanup Operations

Safely rename unused tables with a suffix:

# Dry run first!
bigquery-cleaner rename-old-tables --all-datasets --days 90 --dry-run

# Perform the rename
bigquery-cleaner rename-old-tables --all-datasets --days 90

# Delete renamed tables after verification
# Dry run first!
bigquery-cleaner delete-tables --all-datasets --suffix "_renamed_20241225" --dry-run

# Perform the deletion
bigquery-cleaner delete-tables --all-datasets --suffix "_renamed_20241225"

# Remove empty datasets
bigquery-cleaner delete-empty-datasets --all-datasets

⚙️ Configuration

Tired of typing the same flags? Create a cleaner.toml file in your project root. All CLI options can be persisted here:

[bigquery_cleaner]
# GCP Project ID (defaults to ADC project if omitted)
project = "your-gcp-project"

# List of datasets to scan (dataset names only)
datasets = ["dataset1", "dataset2"]

# List of datasets to ignore
exclude_datasets = ["logs_dataset", "temp_staging"]

# Extra projects whose INFORMATION_SCHEMA.JOBS history should count as usage.
# The main project is always included automatically.
jobs_projects = ["analytics-project", "bi-project"]

# If true, scans all datasets in the project (overrides 'datasets' list)
all_datasets = true

# Lookback window in days for identifying unused tables (default: 30)
days = 60

# Suffix used for renaming and identifying tables for deletion (default: _renamed_YYYYMMDD)
rename_suffix = "_old_backup"

# Default behavior for commands (true = dry run by default)
dry_run = false

# Logging level (DEBUG, INFO, WARNING, ERROR)
log_level = "INFO"

# BigQuery Location (e.g. "US", "EU"). 
# Note: Multi-dataset mode usually auto-detects this.
location = "US"

Then run with:

bigquery-cleaner list-unused-tables --config cleaner.toml

📝 Notes

  • Detection Logic: The list-unused-tables command identifies tables created more than N days ago that do not appear in INFORMATION_SCHEMA.JOBS.referenced_tables within that same window.
  • Cross-Project Usage Checks: Set jobs_projects or pass --jobs-projects to also scan query history from other projects that may read the same tables.
  • Rich Output: All results are displayed in beautiful, sortable tables thanks to the Rich library. Includes total table counts and storage size summaries.
  • Linting & Quality: The project uses Ruff for fast linting and formatting.

Check out the project on:

Developed by Alan Vainsencher.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bigquery_cleaner-0.1.7.tar.gz (66.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bigquery_cleaner-0.1.7-py3-none-any.whl (19.4 kB view details)

Uploaded Python 3

File details

Details for the file bigquery_cleaner-0.1.7.tar.gz.

File metadata

  • Download URL: bigquery_cleaner-0.1.7.tar.gz
  • Upload date:
  • Size: 66.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.5.25

File hashes

Hashes for bigquery_cleaner-0.1.7.tar.gz
Algorithm Hash digest
SHA256 cf3f706c31bb4489b557842a63663d1a5552eb4a20cf553efd240d9d032df3fc
MD5 fef57e9c94dd9010fc0a25b55b41dcf2
BLAKE2b-256 91163aa2516307a19fc619dda5414d36a250228bbce1b0448f594730be3a2051

See more details on using hashes here.

File details

Details for the file bigquery_cleaner-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for bigquery_cleaner-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 88c0ea99ede41c828866b2d86b907ba9a299c4606dc4434c4e0d3379fba71e5b
MD5 b55d5e41ec7ba5f2349e0333dd96e7b8
BLAKE2b-256 e04aa6ec95764bf0c0b5be9b195578bd9257715223f18c9d3aca741847604699

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page