Skip to main content

Yahoo Finance OHLCV → BigQuery: multi-interval ingestion + LLM-friendly docs + Stooq verification

Project description

yfinance-bigquery

Idempotent Yahoo Finance OHLCV → BigQuery ingestion across 5 intervals, with first-class documentation for SQL/LLM agents and internal-consistency verification.

Install

pip install yfinance-bigquery

Quickstart

gcloud auth application-default login

# 1. Seed your symbol universe from the S&P 500 Wikipedia page
yfinance-bigquery universe init \
    --dim-symbols myproject.mydataset.dim_symbols \
    --create-if-missing

# 2. Sync the last week of daily bars for every active ticker
yfinance-bigquery sync \
    --interval 1d \
    --dataset myproject.mydataset.yfinance_v2_analytics \
    --dim-symbols myproject.mydataset.dim_symbols

# 3. Spot-check internal consistency for the current year
yfinance-bigquery verify \
    --source internal \
    --interval 1d \
    --aggregation symbol-season \
    --metric all \
    --season 2026 \
    --table myproject.mydataset.ohlcv_1d

Backfill

Backfill all 5 intervals in resumable yearly chunks:

yfinance-bigquery sync \
    --interval all \
    --start 2020-01-01 --end 2026-05-11 \
    --chunk-by year --resume \
    --dataset myproject.mydataset.yfinance_v2_analytics \
    --dim-symbols myproject.mydataset.dim_symbols

--resume skips chunks already recorded as success in <dataset>._yfinance_ingest_runs. Override with --runs-table if you want the run log in a sidecar dataset. Re-running with the same --chunk-by is safe; switching --chunk-by yearmonth between runs will re-process (chunks must match exactly to skip).

Universe management

# Initialize dim_symbols (first run)
yfinance-bigquery universe init \
    --dim-symbols myproject.mydataset.dim_symbols \
    --create-if-missing

# Refresh constituents (tracks additions and marks removals with date_removed)
yfinance-bigquery universe refresh \
    --dim-symbols myproject.mydataset.dim_symbols

# List all active tickers
yfinance-bigquery universe list \
    --dim-symbols myproject.mydataset.dim_symbols

Documentation

yfinance-bigquery docs --format llm > LLM_CONTEXT.md

Five formats are supported: bq-apply (push column descriptions to BigQuery), llm (a single Markdown file suitable for stuffing into an LLM context window), dictionary (JSON rows for a data dictionary table), markdown (human-readable column reference), and dbt (a dbt YAML schema stub).

Verification

Internal-consistency checks run entirely inside BigQuery — no external data source required. All 5 metrics use zero-tolerance: any violation fraction > 0 is a FAIL.

# Check all metrics across all intervals for 2026
yfinance-bigquery verify \
    --source internal \
    --interval all \
    --aggregation symbol-season \
    --metric all \
    --season 2026 \
    --table-prefix myproject.mydataset.ohlcv

The 5 metrics are:

  • ohlc_monotonic — high >= open/close >= low for every bar
  • volume_non_negative — volume is NULL or >= 0
  • no_future_bars — no bar has a trading_date after today
  • trading_day_alignment — no weekend bars (1d) or out-of-hours bars (intraday)
  • no_duplicate_bars — no two bars share the same (symbol, bar_start_utc)

Seed your data dictionary

If you maintain a data_dictionary table (one row per column with business definitions, tags, and lineage), you can seed it directly:

yfinance-bigquery docs --format dictionary --apply \
    --dataset mydataset \
    --table myproject.mydataset.ohlcv_1d \
    --dictionary-table myproject.shared_ops.data_dictionary

Atomically replaces rows for (dataset, table) only; other entries in the dictionary table are untouched. Required target schema:

dataset, table, column, dtype, description, business_definition,
owner, tags ARRAY<STRING>, source_system, upstream_lineage_json,
created_at TIMESTAMP, updated_at TIMESTAMP

MIT licensed. This software does not include or distribute Yahoo Finance data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yfinance_bigquery-0.1.0.tar.gz (126.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yfinance_bigquery-0.1.0-py3-none-any.whl (37.8 kB view details)

Uploaded Python 3

File details

Details for the file yfinance_bigquery-0.1.0.tar.gz.

File metadata

  • Download URL: yfinance_bigquery-0.1.0.tar.gz
  • Upload date:
  • Size: 126.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yfinance_bigquery-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6d41d15ed5d294ff28d34156c3994d563f83ba0c16da8a506b78ac273e155ff7
MD5 004c3b8650882f3b9a152cbc65c6805f
BLAKE2b-256 ccbd4919d0de3b7d896e694e2d81adad86eca92aec10170c12de5db3ae936733

See more details on using hashes here.

Provenance

The following attestation bundles were made for yfinance_bigquery-0.1.0.tar.gz:

Publisher: release.yml on blahovec-labs/yfinance-bigquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yfinance_bigquery-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yfinance_bigquery-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce4c8cb18d159b5bee59fda6cc7f27d3926fc4ef35cd57c8cc20b5fce46e0b0d
MD5 b2f3cf97c16a7227fa5e84377bbc05c4
BLAKE2b-256 f5ee6d659ef3e96b2c6043e73b2811852b557f709c42c8f0202cf071111f907b

See more details on using hashes here.

Provenance

The following attestation bundles were made for yfinance_bigquery-0.1.0-py3-none-any.whl:

Publisher: release.yml on blahovec-labs/yfinance-bigquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page