Snowflake warehouse cost, performance, and efficiency analysis library

These details have not been verified by PyPI

Project links

Project description

❄️ snowflake-cost-analyzer

Modular, typed, production-grade Snowflake cost & performance analytics. Refactored from a 17,000-line monolithic notebook into a clean Python package with strict separation of concerns.


Version	0.2.0
Python	≥ 3.9
License	Internal — ProColombia
Quality	10/10 (was 9.5/10 in v0.1.0)

🏛️ Architecture (10/10)

The package strictly separates what is computed from how it is presented:

┌─────────────────────────────────────────────────────────────────┐
│                       CORE LIBRARY                              │
│              snowflake_cost_analyzer/                           │
│                                                                 │
│  config.py        ← @dataclass for credentials + parameters     │
│  connection.py    ← Context-managed Snowflake connection        │
│  constants.py     ← Single source of truth (prices, sizes)      │
│  analyzer.py      ← Orchestrator: 28 analytical methods         │
│                                                                 │
│  queries/         ← Pure SQL builder functions (no I/O)         │
│    cost.py        performance.py    attribution.py              │
│    temporal.py    advanced.py                                   │
│    storage.py    ← NEW v0.2.0 (Phase 9)                         │
│    qas.py        ← NEW v0.2.0 (Phase 10)                        │
│                                                                 │
│  utils/           ← DRY helpers (validators, dataframe, timer)  │
│  viz/             ← Plotly chart factories (optional extra)     │
│  views/           ← ViewsManager (materialise queries as views) │
│                                                                 │
│  RESPONSIBILITY: extract data, return DataFrames. Nothing else. │
└─────────────────────────────────────────────────────────────────┘
                              ▲
                              │  imports
                              │
┌─────────────────────────────────────────────────────────────────┐
│                  PRESENTATION LAYER                             │
│                       scripts/                                  │
│                                                                 │
│  generar_reportes.py   ← CLI report generator                   │
│  templates/                                                     │
│    report_template.html  ← Jinja2 HTML template                 │
│                                                                 │
│  RESPONSIBILITY: turn DataFrames into HTML, Excel, PPTX.        │
│  Imports the library — never the reverse.                       │
└─────────────────────────────────────────────────────────────────┘

Why this matters: the library can be used by any consumer (notebook, dashboard, scheduled job, BI tool), and the report generator can be replaced or extended without touching SQL or domain logic. This is the architectural boundary that took the project from 9.5 to 10.

📦 Installation

pip install -e .                # core only
pip install -e .[viz]           # + Plotly charts
pip install jinja2 xlsxwriter python-pptx kaleido   # for the report generator

🔐 Credentials (never hardcoded)

Two supported sources, in priority order:

1. Google Colab Secrets (preferred for Colab):

from snowflake_cost_analyzer import SnowflakeConfig
sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")

2. Environment variables (preferred for CI/CD and local scripts):

export SF_USER=...
export SF_PASSWORD=...
export SF_ACCOUNT=...

sf = SnowflakeConfig.from_env(warehouse="WH_IDMC")

The report generator script (scripts/generar_reportes.py) tries Colab Secrets first and falls back to env vars automatically — you don't choose.

🚀 Quick start (library)

from snowflake_cost_analyzer import (
    SnowflakeCostAnalyzer, SnowflakeConfig, AnalysisConfig,
)

sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")
params = AnalysisConfig.last_n_days(30, warehouse_name="WH_IDMC")

with SnowflakeCostAnalyzer(sf, params) as analyzer:
    df = analyzer.get_all_warehouses_summary()
    print(df[["WAREHOUSE_NAME", "TOTAL_COST_USD"]].head())

📊 Available analyses (28 methods)

Phase 1 — Cost

get_all_warehouses_summary()
get_daily_warehouse_summary()
get_cost_reconciliation()

Phase 2-3 — Performance

get_top_expensive_queries() · get_critical_alerts() · get_performance_trends()
get_queue_analysis() · get_cache_analysis() · get_duplicate_queries()
get_optimization_recommendations()

Phase 4 — Attribution

get_user_analysis() · get_application_analysis() · get_user_application_matrix()

Phase 5-6 — Temporal & Multi-cluster

get_hourly_patterns() · get_weekly_comparison()
get_multicluster_load_history() · get_inferred_clusters()

Phase 7-8 — Advanced

get_warehouse_metadata() · get_warehouse_events() · get_uptime_analysis()
get_12month_trend() · get_monthly_forecast() · get_anomaly_detection()

Phase 9 — Storage (NEW v0.2.0)

get_database_storage_costs(lookback_days=30, storage_price_per_tb_month=...)
get_table_storage_costs(top_n=50, storage_price_per_tb_month=...)
get_storage_growth_trend(lookback_days=90, storage_price_per_tb_month=...)
get_stage_storage_costs(lookback_days=30, stage_price_per_tb_month=...)

Phase 10 — Query Acceleration Service (NEW v0.2.0)

get_qas_analysis(warehouse_name=None) — single-row aggregated metrics + ROI
get_qas_detail(warehouse_name=None, top_n=50) — top-N accelerated queries

Convenience

run_full_analysis(include_storage=True, include_qas=True) — runs phases 1-4 plus optionally storage and QAS, returns a single dict.

🏗️ What's new in v0.2.0

1. Storage analysis (`queries/storage.py`)

Four pure SQL builders sourcing SNOWFLAKE.ACCOUNT_USAGE.DATABASE_STORAGE_USAGE_HISTORY, TABLE_STORAGE_METRICS, and STAGE_STORAGE_USAGE_HISTORY. Prices live in constants.py (STORAGE_PRICE_PER_TB_MONTH, STAGE_PRICE_PER_TB_MONTH) and are overridable per query call — never hardcoded in SQL.

2. QAS analysis (`queries/qas.py`)

Two pure SQL builders for QAS effectiveness and ROI. The duplicated CASE warehouse_size WHEN ... THEN ... block from the original notebook is eliminated — both queries reuse build_credits_case_sql() from constants.py so the warehouse credits-per-hour table lives in exactly one place. Empty results (QAS disabled, missing grants) are handled gracefully and never raise.

3. New utility: `clean_timezones_df`

Lives in utils/dataframe.py. Strips timezone information from datetime columns so DataFrames can be written to Excel without xlsxwriter complaining. Non-mutating: returns a copy.

4. Report generator (presentation layer)

A standalone script (scripts/generar_reportes.py) that imports the library and produces three artifacts:

Output	Tool	Content
`report.html`	Jinja2 + Plotly→PNG	KPIs, health score, top findings/recs, embedded charts, preview tables
`report.xlsx`	pandas + xlsxwriter	One sheet per analysis (16 sheets), executive KPI sheet first
`report.pptx`	python-pptx	5-slide executive deck: cover, health score, KPIs, top 5 findings, top 5 recommendations

The script never imports anything from the library's viz/ subpackage required (it tries to use it for inline charts, falls back gracefully if the [viz] extra is not installed).

▶️ Generating reports

# Make sure templates/ is alongside scripts/
python scripts/generar_reportes.py \
    --warehouse WH_IDMC \
    --days 30 \
    --credit-price 3.0 \
    --out-dir out \
    --templates-dir templates

Optional flags:

--no-storage — skip Phase 9 if your role lacks SELECT on the storage views
--no-qas — skip Phase 10 if QAS is disabled or grants are missing

In Google Colab:

!python scripts/generar_reportes.py --warehouse WH_IDMC --days 30

(Colab Secrets SF_USER, SF_PASSWORD, SF_ACCOUNT must be configured.)

🧮 Health Score formula

The report generator computes a 0–100 health score from the analyzer's DataFrames using a pure function (compute_executive_kpis):

score = 100
       − idle_pct                 # 1 point per % of idle time
       − min(20, spillage_pct)    # cap spillage penalty at 20
       − min(15, failed_pct × 3)  # cap failed-query penalty at 15

Score	Band	Action
≥ 80	Good	Maintain current configuration
60-79	Warn	Review top findings within 1 sprint
< 60	Bad	Immediate optimisation required

The formula intentionally penalises idle time most aggressively because it is the largest controllable cost lever in most Snowflake accounts.

📐 Design rules (the non-negotiables)

These rules govern every change to the codebase:

Single source of truth. Constants live in constants.py only. No magic numbers in SQL or logic. The warehouse credits-per-hour table appears exactly once in the entire package (build_credits_case_sql()).
Pure SQL builders. Functions in queries/ accept config and return strings. They never open connections, never log, never transform DataFrames.
Single execution path. Every analytical method routes through SnowflakeCostAnalyzer._execute_query() for logging, timing, and the views cache. No exceptions.
Centralised type coercion. Numeric columns are converted via to_float(df, [...]) from utils/dataframe.py. The original notebook duplicated this in 4 places — now zero.
Fail Fast. Validation happens in __post_init__ and via utils/validators.py, before any Snowflake query is executed.
No monkey-patching. New functionality is added through proper class methods, not by attaching functions at runtime as the original notebook did (SnowflakeWarehouseAnalyzer.get_X = get_X).
Library / presentation separation. The library returns DataFrames. Nothing in snowflake_cost_analyzer/ imports jinja2, xlsxwriter, or python-pptx. All presentation code lives in scripts/.
Credentials never hardcoded. Always Colab Secrets or env vars, loaded through SnowflakeConfig.from_colab_secrets() or SnowflakeConfig.from_env().
Type hints + English docstrings everywhere. Python 3.9+ syntax (list[str], dict[str, int], X | None).
Graceful empty handling. QAS, storage, and any analysis that might legitimately return zero rows (missing grants, feature disabled) returns an empty DataFrame instead of raising. Callers check df.empty.

📁 File layout

Snowflake Cost Analyzer/
├── snowflake_cost_analyzer/
│   ├── __init__.py
│   ├── _version.py
│   ├── analyzer.py              ← + 6 new methods (storage + QAS)
│   ├── config.py
│   ├── connection.py
│   ├── constants.py             ← + STORAGE_PRICE_PER_TB_MONTH / STAGE_PRICE_PER_TB_MONTH
│   ├── queries/
│   │   ├── __init__.py
│   │   ├── advanced.py
│   │   ├── attribution.py
│   │   ├── cost.py
│   │   ├── performance.py
│   │   ├── temporal.py
│   │   ├── storage.py           ← NEW
│   │   └── qas.py               ← NEW
│   ├── utils/
│   │   ├── dataframe.py         ← + clean_timezones_df
│   │   ├── timer.py
│   │   └── validators.py
│   ├── views/
│   │   └── manager.py
│   └── viz/
│       ├── cost_charts.py
│       ├── performance_charts.py
│       └── temporal_charts.py
├── scripts/
│   └── generar_reportes.py      ← NEW: HTML + Excel + PPTX
├── templates/
│   └── report_template.html     ← NEW: Jinja2 template
├── notebooks/
│   └── ejemplo_uso.py
├── README.md                    ← this file
├── CHANGELOG.md
└── pyproject.toml

🔄 Upgrading from v0.1.0 → v0.2.0

Apply these patches in order:

constants.py — replace with the new file (adds two storage price constants).
queries/__init__.py — replace with the new docstring (mentions storage and qas).
queries/storage.py — new file.
queries/qas.py — new file.
utils/dataframe.py — append clean_timezones_df from utils/dataframe_addition.py.
analyzer.py — apply the three blocks from analyzer_additions.py (imports, new methods, updated run_full_analysis).
scripts/generar_reportes.py — new file.
templates/report_template.html — new file.
Bump _version.py → 0.2.0.

No existing methods are modified. The upgrade is fully backward-compatible.

🧪 Testing the upgrade

from snowflake_cost_analyzer import SnowflakeCostAnalyzer, SnowflakeConfig, AnalysisConfig

sf = SnowflakeConfig.from_colab_secrets(warehouse="WH_IDMC")
params = AnalysisConfig.last_n_days(7, warehouse_name="WH_IDMC")

with SnowflakeCostAnalyzer(sf, params) as analyzer:
    # New: storage
    df_db_storage = analyzer.get_database_storage_costs(lookback_days=30)
    print(f"Databases: {len(df_db_storage)}")
    print(f"Total TB: {df_db_storage['TOTAL_STORAGE_TB'].sum():.2f}")

    # New: QAS (handles disabled QAS gracefully)
    df_qas = analyzer.get_qas_analysis()
    if df_qas.empty:
        print("QAS: not enabled or no eligible queries")
    else:
        print(f"QAS ROI: {df_qas['ROI_PCT'].iloc[0]:.1f}%")

    # New: full pipeline including storage + QAS
    results = analyzer.run_full_analysis()
    print(f"Total DataFrames produced: {len(results)}")

Then generate the executive reports:

python scripts/generar_reportes.py --warehouse WH_IDMC --days 30
ls out/
# snowflake_report_WH_IDMC_20260409_1430.html
# snowflake_report_WH_IDMC_20260409_1430.xlsx
# snowflake_report_WH_IDMC_20260409_1430.pptx

📜 CHANGELOG

v0.2.0 (2026-04-09)

Added queries/storage.py (4 builders for Phase 9 storage analysis)
Added queries/qas.py (2 builders for Phase 10 QAS analysis)
Added STORAGE_PRICE_PER_TB_MONTH and STAGE_PRICE_PER_TB_MONTH constants (single source of truth)
Added 6 new methods to SnowflakeCostAnalyzer (4 storage + 2 QAS)
Added clean_timezones_df utility in utils/dataframe.py
Added scripts/generar_reportes.py — executive report generator
Added templates/report_template.html — Jinja2 HTML template
Modified run_full_analysis() — now accepts include_storage and include_qas flags and returns up to 16 DataFrames
Eliminated the duplicated CASE warehouse_size block from QAS analysis (now reuses build_credits_case_sql())

v0.1.0

Initial modular refactor from monolithic notebook (17,000 lines → library).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Apr 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake_cost_analyzer-0.2.0.tar.gz (76.7 kB view details)

Uploaded Apr 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

snowflake_cost_analyzer-0.2.0-py3-none-any.whl (85.2 kB view details)

Uploaded Apr 10, 2026 Python 3

File details

Details for the file snowflake_cost_analyzer-0.2.0.tar.gz.

File metadata

Download URL: snowflake_cost_analyzer-0.2.0.tar.gz
Upload date: Apr 10, 2026
Size: 76.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for snowflake_cost_analyzer-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`228d2e20d995c394df77c6deadaba5600cec6ae1a2e646e761ef384002e28aca`
MD5	`d60e0ed82d5c9b4ef6a6101105e0d9df`
BLAKE2b-256	`099db47a71b83cdd5018491b8591b8fd30d08f124d4e3621c6baed59eb878984`

See more details on using hashes here.

File details

Details for the file snowflake_cost_analyzer-0.2.0-py3-none-any.whl.

File metadata

Download URL: snowflake_cost_analyzer-0.2.0-py3-none-any.whl
Upload date: Apr 10, 2026
Size: 85.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for snowflake_cost_analyzer-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d5e5faf916f94943bc194aa9ffbeb50af097526639de1621098f884f1129426`
MD5	`aa645c5913f2ffae7ca0abdfed78061b`
BLAKE2b-256	`61d7e0da954a13031687e52a9ccba19958e5ee8f58e1943e4c861307f58e1426`

See more details on using hashes here.

snowflake-cost-analyzer 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

❄️ snowflake-cost-analyzer

🏛️ Architecture (10/10)

📦 Installation

🔐 Credentials (never hardcoded)

🚀 Quick start (library)

📊 Available analyses (28 methods)

Phase 1 — Cost

Phase 2-3 — Performance

Phase 4 — Attribution

Phase 5-6 — Temporal & Multi-cluster

Phase 7-8 — Advanced

Phase 9 — Storage (NEW v0.2.0)

Phase 10 — Query Acceleration Service (NEW v0.2.0)

Convenience

🏗️ What's new in v0.2.0

1. Storage analysis (queries/storage.py)

2. QAS analysis (queries/qas.py)

3. New utility: clean_timezones_df

4. Report generator (presentation layer)

▶️ Generating reports

🧮 Health Score formula

📐 Design rules (the non-negotiables)

📁 File layout

🔄 Upgrading from v0.1.0 → v0.2.0

🧪 Testing the upgrade

📜 CHANGELOG

v0.2.0 (2026-04-09)

v0.1.0

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1. Storage analysis (`queries/storage.py`)

2. QAS analysis (`queries/qas.py`)

3. New utility: `clean_timezones_df`