Snowflake warehouse cost, performance, and efficiency analysis library
Project description
❄️ snowflake-cost-analyzer
Modular, typed, production-grade Snowflake cost & performance analytics. Refactored from a 17,000-line monolithic notebook into a clean Python package with strict separation of concerns.
| Version | 0.2.0 |
| Python | ≥ 3.9 |
| License | Internal — ProColombia |
| Quality | 10/10 (was 9.5/10 in v0.1.0) |
🏛️ Architecture (10/10)
The package strictly separates what is computed from how it is presented:
┌─────────────────────────────────────────────────────────────────┐
│ CORE LIBRARY │
│ snowflake_cost_analyzer/ │
│ │
│ config.py ← @dataclass for credentials + parameters │
│ connection.py ← Context-managed Snowflake connection │
│ constants.py ← Single source of truth (prices, sizes) │
│ analyzer.py ← Orchestrator: 28 analytical methods │
│ │
│ queries/ ← Pure SQL builder functions (no I/O) │
│ cost.py performance.py attribution.py │
│ temporal.py advanced.py │
│ storage.py ← NEW v0.2.0 (Phase 9) │
│ qas.py ← NEW v0.2.0 (Phase 10) │
│ │
│ utils/ ← DRY helpers (validators, dataframe, timer) │
│ viz/ ← Plotly chart factories (optional extra) │
│ views/ ← ViewsManager (materialise queries as views) │
│ │
│ RESPONSIBILITY: extract data, return DataFrames. Nothing else. │
└─────────────────────────────────────────────────────────────────┘
▲
│ imports
│
┌─────────────────────────────────────────────────────────────────┐
│ PRESENTATION LAYER │
│ scripts/ │
│ │
│ generar_reportes.py ← CLI report generator │
│ templates/ │
│ report_template.html ← Jinja2 HTML template │
│ │
│ RESPONSIBILITY: turn DataFrames into HTML, Excel, PPTX. │
│ Imports the library — never the reverse. │
└─────────────────────────────────────────────────────────────────┘
Why this matters: the library can be used by any consumer (notebook, dashboard, scheduled job, BI tool), and the report generator can be replaced or extended without touching SQL or domain logic. This is the architectural boundary that took the project from 9.5 to 10.
📦 Installation
pip install -e . # core only
pip install -e .[viz] # + Plotly charts
pip install jinja2 xlsxwriter python-pptx kaleido # for the report generator
🔐 Credentials (never hardcoded)
Two supported sources, in priority order:
1. Google Colab Secrets (preferred for Colab):
from snowflake_cost_analyzer import SnowflakeConfig
sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")
2. Environment variables (preferred for CI/CD and local scripts):
export SF_USER=...
export SF_PASSWORD=...
export SF_ACCOUNT=...
sf = SnowflakeConfig.from_env(warehouse="WH_IDMC")
The report generator script (scripts/generar_reportes.py) tries Colab Secrets
first and falls back to env vars automatically — you don't choose.
🚀 Quick start (library)
from snowflake_cost_analyzer import (
SnowflakeCostAnalyzer, SnowflakeConfig, AnalysisConfig,
)
sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")
params = AnalysisConfig.last_n_days(30, warehouse_name="WH_IDMC")
with SnowflakeCostAnalyzer(sf, params) as analyzer:
df = analyzer.get_all_warehouses_summary()
print(df[["WAREHOUSE_NAME", "TOTAL_COST_USD"]].head())
📊 Available analyses (28 methods)
Phase 1 — Cost
get_all_warehouses_summary()get_daily_warehouse_summary()get_cost_reconciliation()
Phase 2-3 — Performance
get_top_expensive_queries()·get_critical_alerts()·get_performance_trends()get_queue_analysis()·get_cache_analysis()·get_duplicate_queries()get_optimization_recommendations()
Phase 4 — Attribution
get_user_analysis()·get_application_analysis()·get_user_application_matrix()
Phase 5-6 — Temporal & Multi-cluster
get_hourly_patterns()·get_weekly_comparison()get_multicluster_load_history()·get_inferred_clusters()
Phase 7-8 — Advanced
get_warehouse_metadata()·get_warehouse_events()·get_uptime_analysis()get_12month_trend()·get_monthly_forecast()·get_anomaly_detection()
Phase 9 — Storage (NEW v0.2.0)
get_database_storage_costs(lookback_days=30, storage_price_per_tb_month=...)get_table_storage_costs(top_n=50, storage_price_per_tb_month=...)get_storage_growth_trend(lookback_days=90, storage_price_per_tb_month=...)get_stage_storage_costs(lookback_days=30, stage_price_per_tb_month=...)
Phase 10 — Query Acceleration Service (NEW v0.2.0)
get_qas_analysis(warehouse_name=None)— single-row aggregated metrics + ROIget_qas_detail(warehouse_name=None, top_n=50)— top-N accelerated queries
Convenience
run_full_analysis(include_storage=True, include_qas=True)— runs phases 1-4 plus optionally storage and QAS, returns a single dict.
🏗️ What's new in v0.2.0
1. Storage analysis (queries/storage.py)
Four pure SQL builders sourcing
SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY,
TABLE_STORAGE_METRICS, and STAGE_STORAGE_USAGE_HISTORY. Prices live in
constants.py (STORAGE_PRICE_PER_TB_MONTH, STAGE_PRICE_PER_TB_MONTH)
and are overridable per query call — never hardcoded in SQL.
2. QAS analysis (queries/qas.py)
Two pure SQL builders for QAS effectiveness and ROI. The duplicated
CASE warehouse_size WHEN ... THEN ... block from the original notebook
is eliminated — both queries reuse build_credits_case_sql() from
constants.py so the warehouse credits-per-hour table lives in exactly
one place. Empty results (QAS disabled, missing grants) are handled
gracefully and never raise.
3. New utility: clean_timezones_df
Lives in utils/dataframe.py. Strips timezone information from datetime
columns so DataFrames can be written to Excel without xlsxwriter
complaining. Non-mutating: returns a copy.
4. Report generator (presentation layer)
A standalone script (scripts/generar_reportes.py) that imports the
library and produces three artifacts:
| Output | Tool | Content |
|---|---|---|
report.html |
Jinja2 + Plotly→PNG | KPIs, health score, top findings/recs, embedded charts, preview tables |
report.xlsx |
pandas + xlsxwriter | One sheet per analysis (16 sheets), executive KPI sheet first |
report.pptx |
python-pptx | 5-slide executive deck: cover, health score, KPIs, top 5 findings, top 5 recommendations |
The script never imports anything from the library's viz/ subpackage
required (it tries to use it for inline charts, falls back gracefully if
the [viz] extra is not installed).
▶️ Generating reports
# Make sure templates/ is alongside scripts/
python scripts/generar_reportes.py \
--warehouse WH_IDMC \
--days 30 \
--credit-price 3.0 \
--out-dir out \
--templates-dir templates
Optional flags:
--no-storage— skip Phase 9 if your role lacks SELECT on the storage views--no-qas— skip Phase 10 if QAS is disabled or grants are missing
In Google Colab:
!python scripts/generar_reportes.py --warehouse WH_IDMC --days 30
(Colab Secrets SF_USER, SF_PASSWORD, SF_ACCOUNT must be configured.)
🧮 Health Score formula
The report generator computes a 0–100 health score from the analyzer's
DataFrames using a pure function (compute_executive_kpis):
score = 100
− idle_pct # 1 point per % of idle time
− min(20, spillage_pct) # cap spillage penalty at 20
− min(15, failed_pct × 3) # cap failed-query penalty at 15
| Score | Band | Action |
|---|---|---|
| ≥ 80 | Good | Maintain current configuration |
| 60-79 | Warn | Review top findings within 1 sprint |
| < 60 | Bad | Immediate optimisation required |
The formula intentionally penalises idle time most aggressively because it is the largest controllable cost lever in most Snowflake accounts.
📐 Design rules (the non-negotiables)
These rules govern every change to the codebase:
-
Single source of truth. Constants live in
constants.pyonly. No magic numbers in SQL or logic. The warehouse credits-per-hour table appears exactly once in the entire package (build_credits_case_sql()). -
Pure SQL builders. Functions in
queries/accept config and return strings. They never open connections, never log, never transform DataFrames. -
Single execution path. Every analytical method routes through
SnowflakeCostAnalyzer._execute_query()for logging, timing, and the views cache. No exceptions. -
Centralised type coercion. Numeric columns are converted via
to_float(df, [...])fromutils/dataframe.py. The original notebook duplicated this in 4 places — now zero. -
Fail Fast. Validation happens in
__post_init__and viautils/validators.py, before any Snowflake query is executed. -
No monkey-patching. New functionality is added through proper class methods, not by attaching functions at runtime as the original notebook did (
SnowflakeWarehouseAnalyzer.get_X = get_X). -
Library / presentation separation. The library returns DataFrames. Nothing in
snowflake_cost_analyzer/importsjinja2,xlsxwriter, orpython-pptx. All presentation code lives inscripts/. -
Credentials never hardcoded. Always Colab Secrets or env vars, loaded through
SnowflakeConfig.from_colab_secrets()orSnowflakeConfig.from_env(). -
Type hints + English docstrings everywhere. Python 3.9+ syntax (
list[str],dict[str, int],X | None). -
Graceful empty handling. QAS, storage, and any analysis that might legitimately return zero rows (missing grants, feature disabled) returns an empty DataFrame instead of raising. Callers check
df.empty.
📁 File layout
Snowflake Cost Analyzer/
├── snowflake_cost_analyzer/
│ ├── __init__.py
│ ├── _version.py
│ ├── analyzer.py ← + 6 new methods (storage + QAS)
│ ├── config.py
│ ├── connection.py
│ ├── constants.py ← + STORAGE_PRICE_PER_TB_MONTH / STAGE_PRICE_PER_TB_MONTH
│ ├── queries/
│ │ ├── __init__.py
│ │ ├── advanced.py
│ │ ├── attribution.py
│ │ ├── cost.py
│ │ ├── performance.py
│ │ ├── temporal.py
│ │ ├── storage.py ← NEW
│ │ └── qas.py ← NEW
│ ├── utils/
│ │ ├── dataframe.py ← + clean_timezones_df
│ │ ├── timer.py
│ │ └── validators.py
│ ├── views/
│ │ └── manager.py
│ └── viz/
│ ├── cost_charts.py
│ ├── performance_charts.py
│ └── temporal_charts.py
├── scripts/
│ └── generar_reportes.py ← NEW: HTML + Excel + PPTX
├── templates/
│ └── report_template.html ← NEW: Jinja2 template
├── notebooks/
│ └── ejemplo_uso.py
├── README.md ← this file
├── CHANGELOG.md
└── pyproject.toml
🔄 Upgrading from v0.1.0 → v0.2.0
Apply these patches in order:
constants.py— replace with the new file (adds two storage price constants).queries/__init__.py— replace with the new docstring (mentionsstorageandqas).queries/storage.py— new file.queries/qas.py— new file.utils/dataframe.py— appendclean_timezones_dffromutils/dataframe_addition.py.analyzer.py— apply the three blocks fromanalyzer_additions.py(imports, new methods, updatedrun_full_analysis).scripts/generar_reportes.py— new file.templates/report_template.html— new file.- Bump
_version.py→0.2.0.
No existing methods are modified. The upgrade is fully backward-compatible.
🧪 Testing the upgrade
from snowflake_cost_analyzer import SnowflakeCostAnalyzer, SnowflakeConfig, AnalysisConfig
sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")
params = AnalysisConfig.last_n_days(7, warehouse_name="WH_IDMC")
with SnowflakeCostAnalyzer(sf, params) as analyzer:
# New: storage
df_db_storage = analyzer.get_database_storage_costs(lookback_days=30)
print(f"Databases: {len(df_db_storage)}")
print(f"Total TB: {df_db_storage['TOTAL_STORAGE_TB'].sum():.2f}")
# New: QAS (handles disabled QAS gracefully)
df_qas = analyzer.get_qas_analysis()
if df_qas.empty:
print("QAS: not enabled or no eligible queries")
else:
print(f"QAS ROI: {df_qas['ROI_PCT'].iloc[0]:.1f}%")
# New: full pipeline including storage + QAS
results = analyzer.run_full_analysis()
print(f"Total DataFrames produced: {len(results)}")
Then generate the executive reports:
python scripts/generar_reportes.py --warehouse WH_IDMC --days 30
ls out/
# snowflake_report_WH_IDMC_20260409_1430.html
# snowflake_report_WH_IDMC_20260409_1430.xlsx
# snowflake_report_WH_IDMC_20260409_1430.pptx
📜 CHANGELOG
v0.2.0 (2026-04-09)
- Added
queries/storage.py(4 builders for Phase 9 storage analysis) - Added
queries/qas.py(2 builders for Phase 10 QAS analysis) - Added
STORAGE_PRICE_PER_TB_MONTHandSTAGE_PRICE_PER_TB_MONTHconstants (single source of truth) - Added 6 new methods to
SnowflakeCostAnalyzer(4 storage + 2 QAS) - Added
clean_timezones_dfutility inutils/dataframe.py - Added
scripts/generar_reportes.py— executive report generator - Added
templates/report_template.html— Jinja2 HTML template - Modified
run_full_analysis()— now acceptsinclude_storageandinclude_qasflags and returns up to 16 DataFrames - Eliminated the duplicated
CASE warehouse_sizeblock from QAS analysis (now reusesbuild_credits_case_sql())
v0.1.0
- Initial modular refactor from monolithic notebook (17,000 lines → library).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file snowflake_cost_analyzer-0.2.0.tar.gz.
File metadata
- Download URL: snowflake_cost_analyzer-0.2.0.tar.gz
- Upload date:
- Size: 76.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
228d2e20d995c394df77c6deadaba5600cec6ae1a2e646e761ef384002e28aca
|
|
| MD5 |
d60e0ed82d5c9b4ef6a6101105e0d9df
|
|
| BLAKE2b-256 |
099db47a71b83cdd5018491b8591b8fd30d08f124d4e3621c6baed59eb878984
|
File details
Details for the file snowflake_cost_analyzer-0.2.0-py3-none-any.whl.
File metadata
- Download URL: snowflake_cost_analyzer-0.2.0-py3-none-any.whl
- Upload date:
- Size: 85.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d5e5faf916f94943bc194aa9ffbeb50af097526639de1621098f884f1129426
|
|
| MD5 |
aa645c5913f2ffae7ca0abdfed78061b
|
|
| BLAKE2b-256 |
61d7e0da954a13031687e52a9ccba19958e5ee8f58e1943e4c861307f58e1426
|