Skip to main content

BigQuery cost monitoring and optimization toolkit

Project description

BQ FinOps CLI

BigQuery cost monitoring and optimization toolkit for data engineers

Monitor BigQuery costs, analyze usage patterns, and optimize table structures with a simple command-line interface.

PyPI version Python 3.8+ License: MIT

๐ŸŽฏ Features

  • Cost Analysis: Analyze BigQuery costs over time periods
  • Query Insights: Identify expensive queries and usage patterns
  • Table Analysis: Evaluate table structure and optimization opportunities
  • Optimization Recommendations: Get actionable suggestions for cost reduction
  • DDL Generation: Auto-generate partitioning/clustering DDL
  • Multi-dimensional Breakdown: Costs by dataset, user, table
  • CLI & Python API: Use as command-line tool or import as library

๐Ÿ“ฆ Installation

pip install bq-finops-cli

Requirements:

  • Python 3.8+
  • GCP credentials with BigQuery access
  • BigQuery API enabled in your project

๐Ÿš€ Quick Start

1. Setup Authentication

# Set up GCP credentials
gcloud auth application-default login

# Or set service account key
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

2. Analyze Costs

# Analyze costs for the last 30 days
bq-finops analyze costs -p my-gcp-project

# Analyze specific date range
bq-finops analyze costs -p my-project \
  --start-date 2025-01-01 \
  --end-date 2025-01-31

Sample Output:

============================================================
๐Ÿ’ฐ COST SUMMARY
============================================================
Total Cost (USD)        $1,234.56
Query Count             15,432
Avg Cost per Query      $0.0800
Total Bytes Processed   987,654,321,098

============================================================
๐Ÿ”ฅ TOP EXPENSIVE QUERIES
============================================================
โ•”โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฆโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•—
โ•‘ Job ID               โ•‘ User                   โ•‘ Cost        โ•‘ Bytes        โ•‘
โ• โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฌโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฃ
โ•‘ job_abc123...        โ•‘ user@example.com       โ•‘ $125.50     โ•‘ 20,123,456   โ•‘
โ•‘ job_def456...        โ•‘ analyst@example.com    โ•‘ $98.30      โ•‘ 15,789,012   โ•‘
โ•šโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•ฉโ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•โ•

3. Analyze a Table

bq-finops analyze table -p my-project -d warehouse -t fact_sales

4. Generate Optimization Report

bq-finops optimize report -p my-project -d warehouse -t fact_sales

5. Generate DDL

# Add partitioning
bq-finops optimize generate-ddl -p my-project \
  -d warehouse -t fact_sales \
  --partition-column sale_date

# Add partitioning + clustering
bq-finops optimize generate-ddl -p my-project \
  -d warehouse -t fact_sales \
  --partition-column sale_date \
  --cluster-columns customer_id \
  --cluster-columns product_id

๐Ÿ“š Commands

Analyze Commands

bq-finops analyze costs

Analyze BigQuery costs over a time period.

Options:

  • -p, --project-id: GCP project ID (required)
  • -s, --start-date: Start date (YYYY-MM-DD)
  • -e, --end-date: End date (YYYY-MM-DD)
  • -d, --datasets: Filter by datasets (multiple allowed)
  • -f, --format: Output format (table/json)

Examples:

# Last 30 days (default)
bq-finops analyze costs -p my-project

# Specific date range
bq-finops analyze costs -p my-project \
  --start-date 2025-01-01 --end-date 2025-01-31

# Specific datasets
bq-finops analyze costs -p my-project \
  -d warehouse -d marts

# JSON output
bq-finops analyze costs -p my-project --format json

bq-finops analyze table

Analyze a specific table for optimization opportunities.

Options:

  • -p, --project-id: GCP project ID (required)
  • -d, --dataset-id: Dataset ID (required)
  • -t, --table-id: Table ID (required)

Example:

bq-finops analyze table -p my-project -d warehouse -t dim_employee

Optimize Commands

bq-finops optimize report

Generate optimization report with recommendations.

Options:

  • -p, --project-id: GCP project ID (required)
  • -d, --dataset-id: Dataset ID (required)
  • -t, --table-id: Table ID (required)

Example:

bq-finops optimize report -p my-project -d warehouse -t fact_payroll

bq-finops optimize generate-ddl

Generate DDL for table optimization.

Options:

  • -p, --project-id: GCP project ID (required)
  • -d, --dataset-id: Dataset ID (required)
  • -t, --table-id: Table ID (required)
  • --partition-column: Column to partition by
  • --cluster-columns: Columns to cluster by (multiple allowed)

Examples:

# Partition only
bq-finops optimize generate-ddl -p my-project \
  -d warehouse -t fact_sales \
  --partition-column sale_date

# Cluster only
bq-finops optimize generate-ddl -p my-project \
  -d warehouse -t fact_sales \
  --cluster-columns customer_id --cluster-columns product_id

# Both
bq-finops optimize generate-ddl -p my-project \
  -d warehouse -t fact_sales \
  --partition-column sale_date \
  --cluster-columns customer_id --cluster-columns product_id

Utility Commands

bq-finops examples

Show example commands.

bq-finops --version

Show version.

bq-finops --help

Show help.

๐Ÿ Python API

Use BQ FinOps programmatically:

Cost Analysis

from bq_finops import CostAnalyzer

# Initialize analyzer
analyzer = CostAnalyzer(project_id="my-gcp-project")

# Analyze costs
report = analyzer.analyze_period(
    start_date="2025-01-01",
    end_date="2025-01-31",
    datasets=["warehouse", "marts"]
)

print(f"Total cost: ${report.total_cost:.2f}")
print(f"Query count: {report.query_count}")

# Analyze a specific table
table_analysis = analyzer.analyze_table("warehouse", "fact_sales")
print(f"Table size: {table_analysis['size_gb']} GB")

# Get dataset cost
dataset_cost = analyzer.get_dataset_cost("warehouse", days=30)
print(f"Dataset cost: ${dataset_cost['total_cost_usd']:.2f}")

Query Optimization

from bq_finops import QueryOptimizer

# Initialize optimizer
optimizer = QueryOptimizer(project_id="my-gcp-project")

# Generate partition DDL
ddl = optimizer.generate_partition_ddl(
    "warehouse",
    "fact_sales",
    "sale_date"
)
print(ddl)

# Generate cluster DDL
ddl = optimizer.generate_cluster_ddl(
    "warehouse",
    "fact_sales",
    ["customer_id", "product_id"],
    partition_column="sale_date"
)
print(ddl)

# Analyze a query
query = "SELECT * FROM `project.dataset.table` WHERE date > '2025-01-01'"
analysis = optimizer.analyze_query(query)
print(f"Recommendations: {analysis['recommendations']}")

# Generate optimization report
report = optimizer.generate_optimization_report("warehouse", "fact_sales")
print(f"Current state: {report['current_state']}")
print(f"Recommendations: {report['recommendations']}")

๐Ÿ’ฐ Cost Optimization Tips

1. Partition Large Tables

  • Impact: 50-90% cost reduction
  • Best for: Tables > 1 GB with date/timestamp column
  • Recommended: Partition by DATE or TIMESTAMP column

2. Cluster Frequently Filtered Columns

  • Impact: 10-30% performance improvement
  • Best for: Tables > 1 GB with common filter patterns
  • Recommended: Cluster by 2-4 columns max

3. Set Data Retention Policies

  • Impact: Reduces storage costs
  • Best for: Time-series data
  • Recommended: Set partition expiration (e.g., 90 days)

4. Avoid SELECT *

  • Impact: 50-80% cost reduction per query
  • Recommendation: Select only needed columns

5. Use Partition Filters

  • Impact: 90%+ cost reduction per query
  • Recommendation: Always filter on partition column

๐ŸŽฏ Use Cases

FinOps Dashboard

Monitor monthly BigQuery spending:

# Get current month costs
bq-finops analyze costs -p my-project \
  --start-date 2025-11-01 \
  --format json > november_costs.json

Optimization Audit

Audit all tables for optimization opportunities:

# Check each table
for table in fact_sales fact_orders fact_inventory; do
  bq-finops optimize report -p my-project -d warehouse -t $table
done

CI/CD Integration

Add cost checks to your CI pipeline:

# Validate query before deployment
bq-finops analyze costs -p dev-project --format json | \
  jq '.total_cost < 100' # Fail if > $100

๐Ÿ”ง Development

Setup

git clone https://github.com/yourusername/bq-finops-cli.git
cd bq-finops-cli
pip install -e ".[dev]"

Run Tests

pytest tests/ -v --cov=bq_finops

Code Quality

black src/
flake8 src/

๐Ÿ“– Examples

See the examples/ directory for sample configurations and scripts.

๐Ÿค Contributing

Contributions welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new features
  4. Submit a pull request

๐Ÿ“„ License

MIT License - see LICENSE file for details

๐Ÿ”— Links

๐Ÿ’ก Why BQ FinOps CLI?

  • Save Money: Identify and fix expensive queries
  • Optimize Performance: Proper partitioning/clustering
  • Data Governance: Track usage by team/dataset
  • Proactive Monitoring: Catch cost spikes early
  • Best Practices: Built-in BigQuery optimization rules

๐ŸŽ“ Related Projects

This module is part of the Payroll & Workforce Analytics Modernization project:

  • synthetic-payroll-lab - Generate test data
  • scd2-bq-engine - SCD Type 2 automation
  • dataform-warehouse-blueprints - SQLX template generator

Built with โค๏ธ for data engineers who care about costs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bq_finops_cli-0.1.0.tar.gz (19.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bq_finops_cli-0.1.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file bq_finops_cli-0.1.0.tar.gz.

File metadata

  • Download URL: bq_finops_cli-0.1.0.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bq_finops_cli-0.1.0.tar.gz
Algorithm Hash digest
SHA256 453265aea74b8fbdd47cfa338464f8188687879df9956bbe473de3e18b585081
MD5 edf23678737978bb14c683e58b0c89a9
BLAKE2b-256 0a3ef0d1407e5fa3ecdc74445f57259560e8b936c636e743b574c7576b62acee

See more details on using hashes here.

File details

Details for the file bq_finops_cli-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: bq_finops_cli-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for bq_finops_cli-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8ccaab8d24fe0a54d9fb489e52f694646b127962b2270342cdc06d26eff86730
MD5 bc9ff86862f1134045beb7584da4b4c7
BLAKE2b-256 5a35174da2a43c5fa74644b503eebf31a9570f2d3ea5c0bae3ed846b5da5e749

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page