Skip to main content

Yahoo Finance OHLCV → BigQuery: multi-interval ingestion + LLM-friendly docs + Stooq verification

Project description

yfinance-bigquery

Idempotent Yahoo Finance OHLCV → BigQuery ingestion across 5 intervals, with first-class documentation for SQL/LLM agents and internal-consistency verification.

Install

pip install yfinance-bigquery

Quickstart

gcloud auth application-default login

# 1. Seed your symbol universe from the S&P 500 Wikipedia page
yfinance-bigquery universe init \
    --dim-symbols myproject.mydataset.dim_symbols \
    --create-if-missing

# 2. Sync the last week of daily bars for every active ticker
yfinance-bigquery sync \
    --interval 1d \
    --dataset myproject.mydataset.yfinance_v2_analytics \
    --dim-symbols myproject.mydataset.dim_symbols

# 3. Spot-check internal consistency for the current year
yfinance-bigquery verify \
    --source internal \
    --interval 1d \
    --aggregation symbol-season \
    --metric all \
    --season 2026 \
    --table myproject.mydataset.ohlcv_1d

Backfill

Backfill all 5 intervals in resumable yearly chunks:

yfinance-bigquery sync \
    --interval all \
    --start 2020-01-01 --end 2026-05-11 \
    --chunk-by year --resume \
    --dataset myproject.mydataset.yfinance_v2_analytics \
    --dim-symbols myproject.mydataset.dim_symbols

--resume skips chunks already recorded as success in <dataset>._yfinance_ingest_runs. Override with --runs-table if you want the run log in a sidecar dataset. Re-running with the same --chunk-by is safe; switching --chunk-by yearmonth between runs will re-process (chunks must match exactly to skip).

Universe management

# Initialize dim_symbols (first run)
yfinance-bigquery universe init \
    --dim-symbols myproject.mydataset.dim_symbols \
    --create-if-missing

# Refresh constituents (tracks additions and marks removals with date_removed)
yfinance-bigquery universe refresh \
    --dim-symbols myproject.mydataset.dim_symbols

# List all active tickers
yfinance-bigquery universe list \
    --dim-symbols myproject.mydataset.dim_symbols

Documentation

yfinance-bigquery docs --format llm > LLM_CONTEXT.md

Five formats are supported: bq-apply (push column descriptions to BigQuery), llm (a single Markdown file suitable for stuffing into an LLM context window), dictionary (JSON rows for a data dictionary table), markdown (human-readable column reference), and dbt (a dbt YAML schema stub).

Verification

Internal-consistency checks run entirely inside BigQuery — no external data source required. All 5 metrics use zero-tolerance: any violation fraction > 0 is a FAIL.

# Check all metrics across all intervals for 2026
yfinance-bigquery verify \
    --source internal \
    --interval all \
    --aggregation symbol-season \
    --metric all \
    --season 2026 \
    --table-prefix myproject.mydataset.ohlcv

The 5 metrics are:

  • ohlc_monotonic — high >= open/close >= low for every bar
  • volume_non_negative — volume is NULL or >= 0
  • no_future_bars — no bar has a trading_date after today
  • trading_day_alignment — no weekend bars (1d) or out-of-hours bars (intraday)
  • no_duplicate_bars — no two bars share the same (symbol, bar_start_utc)

Seed your data dictionary

If you maintain a data_dictionary table (one row per column with business definitions, tags, and lineage), you can seed it directly:

yfinance-bigquery docs --format dictionary --apply \
    --dataset mydataset \
    --table myproject.mydataset.ohlcv_1d \
    --dictionary-table myproject.shared_ops.data_dictionary

Atomically replaces rows for (dataset, table) only; other entries in the dictionary table are untouched. Required target schema:

dataset, table, column, dtype, description, business_definition,
owner, tags ARRAY<STRING>, source_system, upstream_lineage_json,
created_at TIMESTAMP, updated_at TIMESTAMP

MIT licensed. This software does not include or distribute Yahoo Finance data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yfinance_bigquery-0.2.0.tar.gz (149.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yfinance_bigquery-0.2.0-py3-none-any.whl (49.6 kB view details)

Uploaded Python 3

File details

Details for the file yfinance_bigquery-0.2.0.tar.gz.

File metadata

  • Download URL: yfinance_bigquery-0.2.0.tar.gz
  • Upload date:
  • Size: 149.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for yfinance_bigquery-0.2.0.tar.gz
Algorithm Hash digest
SHA256 15a80673779ecc5d24063c400e8f51ea4098715e73354d4ac80a6681224dc319
MD5 42fa0f4ecc20feac7fa0dba39bc68cf1
BLAKE2b-256 ee7cf4cfc81fdda841ce491dd781fbafdf56f061d32c19b4a4e86f770a1db804

See more details on using hashes here.

Provenance

The following attestation bundles were made for yfinance_bigquery-0.2.0.tar.gz:

Publisher: release.yml on blahovec-labs/yfinance-bigquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file yfinance_bigquery-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for yfinance_bigquery-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1967e297242bd07cc25e4bd53542e539232a710119f40aec7ee49fef52478f6
MD5 134d519e168e3e9d12c8289f60767427
BLAKE2b-256 24a5385b54e2c1ec10ebe2c1a0f6b4fd03cc22eefd0dd71ba1af7ee862f6a235

See more details on using hashes here.

Provenance

The following attestation bundles were made for yfinance_bigquery-0.2.0-py3-none-any.whl:

Publisher: release.yml on blahovec-labs/yfinance-bigquery

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page