BigQuery cost monitoring and optimization toolkit
Project description
BQ FinOps CLI
BigQuery cost monitoring and optimization toolkit for data engineers
Monitor BigQuery costs, analyze usage patterns, and optimize table structures with a simple command-line interface.
๐ฏ Features
- Cost Analysis: Analyze BigQuery costs over time periods
- Query Insights: Identify expensive queries and usage patterns
- Table Analysis: Evaluate table structure and optimization opportunities
- Optimization Recommendations: Get actionable suggestions for cost reduction
- DDL Generation: Auto-generate partitioning/clustering DDL
- Multi-dimensional Breakdown: Costs by dataset, user, table
- CLI & Python API: Use as command-line tool or import as library
๐ฆ Installation
pip install bq-finops-cli
Requirements:
- Python 3.8+
- GCP credentials with BigQuery access
- BigQuery API enabled in your project
๐ Quick Start
1. Setup Authentication
# Set up GCP credentials
gcloud auth application-default login
# Or set service account key
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
2. Analyze Costs
# Analyze costs for the last 30 days
bq-finops analyze costs -p my-gcp-project
# Analyze specific date range
bq-finops analyze costs -p my-project \
--start-date 2025-01-01 \
--end-date 2025-01-31
Sample Output:
============================================================
๐ฐ COST SUMMARY
============================================================
Total Cost (USD) $1,234.56
Query Count 15,432
Avg Cost per Query $0.0800
Total Bytes Processed 987,654,321,098
============================================================
๐ฅ TOP EXPENSIVE QUERIES
============================================================
โโโโโโโโโโโโโโโโโโโโโโโโฆโโโโโโโโโโโโโโโโโโโโโโโโโฆโโโโโโโโโโโโโโฆโโโโโโโโโโโโโโโ
โ Job ID โ User โ Cost โ Bytes โ
โ โโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฃ
โ job_abc123... โ user@example.com โ $125.50 โ 20,123,456 โ
โ job_def456... โ analyst@example.com โ $98.30 โ 15,789,012 โ
โโโโโโโโโโโโโโโโโโโโโโโโฉโโโโโโโโโโโโโโโโโโโโโโโโโฉโโโโโโโโโโโโโโฉโโโโโโโโโโโโโโโ
3. Analyze a Table
bq-finops analyze table -p my-project -d warehouse -t fact_sales
4. Generate Optimization Report
bq-finops optimize report -p my-project -d warehouse -t fact_sales
5. Generate DDL
# Add partitioning
bq-finops optimize generate-ddl -p my-project \
-d warehouse -t fact_sales \
--partition-column sale_date
# Add partitioning + clustering
bq-finops optimize generate-ddl -p my-project \
-d warehouse -t fact_sales \
--partition-column sale_date \
--cluster-columns customer_id \
--cluster-columns product_id
๐ Commands
Analyze Commands
bq-finops analyze costs
Analyze BigQuery costs over a time period.
Options:
-p, --project-id: GCP project ID (required)-s, --start-date: Start date (YYYY-MM-DD)-e, --end-date: End date (YYYY-MM-DD)-d, --datasets: Filter by datasets (multiple allowed)-f, --format: Output format (table/json)
Examples:
# Last 30 days (default)
bq-finops analyze costs -p my-project
# Specific date range
bq-finops analyze costs -p my-project \
--start-date 2025-01-01 --end-date 2025-01-31
# Specific datasets
bq-finops analyze costs -p my-project \
-d warehouse -d marts
# JSON output
bq-finops analyze costs -p my-project --format json
bq-finops analyze table
Analyze a specific table for optimization opportunities.
Options:
-p, --project-id: GCP project ID (required)-d, --dataset-id: Dataset ID (required)-t, --table-id: Table ID (required)
Example:
bq-finops analyze table -p my-project -d warehouse -t dim_employee
Optimize Commands
bq-finops optimize report
Generate optimization report with recommendations.
Options:
-p, --project-id: GCP project ID (required)-d, --dataset-id: Dataset ID (required)-t, --table-id: Table ID (required)
Example:
bq-finops optimize report -p my-project -d warehouse -t fact_payroll
bq-finops optimize generate-ddl
Generate DDL for table optimization.
Options:
-p, --project-id: GCP project ID (required)-d, --dataset-id: Dataset ID (required)-t, --table-id: Table ID (required)--partition-column: Column to partition by--cluster-columns: Columns to cluster by (multiple allowed)
Examples:
# Partition only
bq-finops optimize generate-ddl -p my-project \
-d warehouse -t fact_sales \
--partition-column sale_date
# Cluster only
bq-finops optimize generate-ddl -p my-project \
-d warehouse -t fact_sales \
--cluster-columns customer_id --cluster-columns product_id
# Both
bq-finops optimize generate-ddl -p my-project \
-d warehouse -t fact_sales \
--partition-column sale_date \
--cluster-columns customer_id --cluster-columns product_id
Utility Commands
bq-finops examples
Show example commands.
bq-finops --version
Show version.
bq-finops --help
Show help.
๐ Python API
Use BQ FinOps programmatically:
Cost Analysis
from bq_finops import CostAnalyzer
# Initialize analyzer
analyzer = CostAnalyzer(project_id="my-gcp-project")
# Analyze costs
report = analyzer.analyze_period(
start_date="2025-01-01",
end_date="2025-01-31",
datasets=["warehouse", "marts"]
)
print(f"Total cost: ${report.total_cost:.2f}")
print(f"Query count: {report.query_count}")
# Analyze a specific table
table_analysis = analyzer.analyze_table("warehouse", "fact_sales")
print(f"Table size: {table_analysis['size_gb']} GB")
# Get dataset cost
dataset_cost = analyzer.get_dataset_cost("warehouse", days=30)
print(f"Dataset cost: ${dataset_cost['total_cost_usd']:.2f}")
Query Optimization
from bq_finops import QueryOptimizer
# Initialize optimizer
optimizer = QueryOptimizer(project_id="my-gcp-project")
# Generate partition DDL
ddl = optimizer.generate_partition_ddl(
"warehouse",
"fact_sales",
"sale_date"
)
print(ddl)
# Generate cluster DDL
ddl = optimizer.generate_cluster_ddl(
"warehouse",
"fact_sales",
["customer_id", "product_id"],
partition_column="sale_date"
)
print(ddl)
# Analyze a query
query = "SELECT * FROM `project.dataset.table` WHERE date > '2025-01-01'"
analysis = optimizer.analyze_query(query)
print(f"Recommendations: {analysis['recommendations']}")
# Generate optimization report
report = optimizer.generate_optimization_report("warehouse", "fact_sales")
print(f"Current state: {report['current_state']}")
print(f"Recommendations: {report['recommendations']}")
๐ฐ Cost Optimization Tips
1. Partition Large Tables
- Impact: 50-90% cost reduction
- Best for: Tables > 1 GB with date/timestamp column
- Recommended: Partition by DATE or TIMESTAMP column
2. Cluster Frequently Filtered Columns
- Impact: 10-30% performance improvement
- Best for: Tables > 1 GB with common filter patterns
- Recommended: Cluster by 2-4 columns max
3. Set Data Retention Policies
- Impact: Reduces storage costs
- Best for: Time-series data
- Recommended: Set partition expiration (e.g., 90 days)
4. Avoid SELECT *
- Impact: 50-80% cost reduction per query
- Recommendation: Select only needed columns
5. Use Partition Filters
- Impact: 90%+ cost reduction per query
- Recommendation: Always filter on partition column
๐ฏ Use Cases
FinOps Dashboard
Monitor monthly BigQuery spending:
# Get current month costs
bq-finops analyze costs -p my-project \
--start-date 2025-11-01 \
--format json > november_costs.json
Optimization Audit
Audit all tables for optimization opportunities:
# Check each table
for table in fact_sales fact_orders fact_inventory; do
bq-finops optimize report -p my-project -d warehouse -t $table
done
CI/CD Integration
Add cost checks to your CI pipeline:
# Validate query before deployment
bq-finops analyze costs -p dev-project --format json | \
jq '.total_cost < 100' # Fail if > $100
๐ง Development
Setup
git clone https://github.com/yourusername/bq-finops-cli.git
cd bq-finops-cli
pip install -e ".[dev]"
Run Tests
pytest tests/ -v --cov=bq_finops
Code Quality
black src/
flake8 src/
๐ Examples
See the examples/ directory for sample configurations and scripts.
๐ค Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Submit a pull request
๐ License
MIT License - see LICENSE file for details
๐ Links
- PyPI: https://pypi.org/project/bq-finops-cli/
- GitHub: https://github.com/yourusername/bq-finops-cli
- Documentation: https://github.com/yourusername/bq-finops-cli#readme
๐ก Why BQ FinOps CLI?
- Save Money: Identify and fix expensive queries
- Optimize Performance: Proper partitioning/clustering
- Data Governance: Track usage by team/dataset
- Proactive Monitoring: Catch cost spikes early
- Best Practices: Built-in BigQuery optimization rules
๐ Related Projects
This module is part of the Payroll & Workforce Analytics Modernization project:
synthetic-payroll-lab- Generate test datascd2-bq-engine- SCD Type 2 automationdataform-warehouse-blueprints- SQLX template generator
Built with โค๏ธ for data engineers who care about costs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bq_finops_cli-0.1.0.tar.gz.
File metadata
- Download URL: bq_finops_cli-0.1.0.tar.gz
- Upload date:
- Size: 19.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
453265aea74b8fbdd47cfa338464f8188687879df9956bbe473de3e18b585081
|
|
| MD5 |
edf23678737978bb14c683e58b0c89a9
|
|
| BLAKE2b-256 |
0a3ef0d1407e5fa3ecdc74445f57259560e8b936c636e743b574c7576b62acee
|
File details
Details for the file bq_finops_cli-0.1.0-py3-none-any.whl.
File metadata
- Download URL: bq_finops_cli-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ccaab8d24fe0a54d9fb489e52f694646b127962b2270342cdc06d26eff86730
|
|
| MD5 |
bc9ff86862f1134045beb7584da4b4c7
|
|
| BLAKE2b-256 |
5a35174da2a43c5fa74644b503eebf31a9570f2d3ea5c0bae3ed846b5da5e749
|