Yahoo Finance OHLCV → BigQuery: multi-interval ingestion + LLM-friendly docs + Stooq verification
Project description
yfinance-bigquery
Idempotent Yahoo Finance OHLCV → BigQuery ingestion across 5 intervals, with first-class documentation for SQL/LLM agents and internal-consistency verification.
Install
pip install yfinance-bigquery
Quickstart
gcloud auth application-default login
# 1. Seed your symbol universe from the S&P 500 Wikipedia page
yfinance-bigquery universe init \
--dim-symbols myproject.mydataset.dim_symbols \
--create-if-missing
# 2. Sync the last week of daily bars for every active ticker
yfinance-bigquery sync \
--interval 1d \
--dataset myproject.mydataset.yfinance_v2_analytics \
--dim-symbols myproject.mydataset.dim_symbols
# 3. Spot-check internal consistency for the current year
yfinance-bigquery verify \
--source internal \
--interval 1d \
--aggregation symbol-season \
--metric all \
--season 2026 \
--table myproject.mydataset.ohlcv_1d
Backfill
Backfill all 5 intervals in resumable yearly chunks:
yfinance-bigquery sync \
--interval all \
--start 2020-01-01 --end 2026-05-11 \
--chunk-by year --resume \
--dataset myproject.mydataset.yfinance_v2_analytics \
--dim-symbols myproject.mydataset.dim_symbols
--resume skips chunks already recorded as success in
<dataset>._yfinance_ingest_runs. Override with --runs-table if you
want the run log in a sidecar dataset. Re-running with the same
--chunk-by is safe; switching --chunk-by year → month between
runs will re-process (chunks must match exactly to skip).
Universe management
# Initialize dim_symbols (first run)
yfinance-bigquery universe init \
--dim-symbols myproject.mydataset.dim_symbols \
--create-if-missing
# Refresh constituents (tracks additions and marks removals with date_removed)
yfinance-bigquery universe refresh \
--dim-symbols myproject.mydataset.dim_symbols
# List all active tickers
yfinance-bigquery universe list \
--dim-symbols myproject.mydataset.dim_symbols
Documentation
yfinance-bigquery docs --format llm > LLM_CONTEXT.md
Five formats are supported: bq-apply (push column descriptions to BigQuery),
llm (a single Markdown file suitable for stuffing into an LLM context window),
dictionary (JSON rows for a data dictionary table), markdown (human-readable
column reference), and dbt (a dbt YAML schema stub).
Verification
Internal-consistency checks run entirely inside BigQuery — no external data source required. All 5 metrics use zero-tolerance: any violation fraction > 0 is a FAIL.
# Check all metrics across all intervals for 2026
yfinance-bigquery verify \
--source internal \
--interval all \
--aggregation symbol-season \
--metric all \
--season 2026 \
--table-prefix myproject.mydataset.ohlcv
The 5 metrics are:
ohlc_monotonic— high >= open/close >= low for every barvolume_non_negative— volume is NULL or >= 0no_future_bars— no bar has a trading_date after todaytrading_day_alignment— no weekend bars (1d) or out-of-hours bars (intraday)no_duplicate_bars— no two bars share the same (symbol, bar_start_utc)
Seed your data dictionary
If you maintain a data_dictionary table (one row per column with business
definitions, tags, and lineage), you can seed it directly:
yfinance-bigquery docs --format dictionary --apply \
--dataset mydataset \
--table myproject.mydataset.ohlcv_1d \
--dictionary-table myproject.shared_ops.data_dictionary
Atomically replaces rows for (dataset, table) only; other entries in the
dictionary table are untouched. Required target schema:
dataset, table, column, dtype, description, business_definition,
owner, tags ARRAY<STRING>, source_system, upstream_lineage_json,
created_at TIMESTAMP, updated_at TIMESTAMP
MIT licensed. This software does not include or distribute Yahoo Finance data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yfinance_bigquery-0.1.0.tar.gz.
File metadata
- Download URL: yfinance_bigquery-0.1.0.tar.gz
- Upload date:
- Size: 126.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6d41d15ed5d294ff28d34156c3994d563f83ba0c16da8a506b78ac273e155ff7
|
|
| MD5 |
004c3b8650882f3b9a152cbc65c6805f
|
|
| BLAKE2b-256 |
ccbd4919d0de3b7d896e694e2d81adad86eca92aec10170c12de5db3ae936733
|
Provenance
The following attestation bundles were made for yfinance_bigquery-0.1.0.tar.gz:
Publisher:
release.yml on blahovec-labs/yfinance-bigquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yfinance_bigquery-0.1.0.tar.gz -
Subject digest:
6d41d15ed5d294ff28d34156c3994d563f83ba0c16da8a506b78ac273e155ff7 - Sigstore transparency entry: 1510791592
- Sigstore integration time:
-
Permalink:
blahovec-labs/yfinance-bigquery@d5c90746167d4eed8bc75bc3e016bf01ab25524b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/blahovec-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d5c90746167d4eed8bc75bc3e016bf01ab25524b -
Trigger Event:
push
-
Statement type:
File details
Details for the file yfinance_bigquery-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yfinance_bigquery-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce4c8cb18d159b5bee59fda6cc7f27d3926fc4ef35cd57c8cc20b5fce46e0b0d
|
|
| MD5 |
b2f3cf97c16a7227fa5e84377bbc05c4
|
|
| BLAKE2b-256 |
f5ee6d659ef3e96b2c6043e73b2811852b557f709c42c8f0202cf071111f907b
|
Provenance
The following attestation bundles were made for yfinance_bigquery-0.1.0-py3-none-any.whl:
Publisher:
release.yml on blahovec-labs/yfinance-bigquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yfinance_bigquery-0.1.0-py3-none-any.whl -
Subject digest:
ce4c8cb18d159b5bee59fda6cc7f27d3926fc4ef35cd57c8cc20b5fce46e0b0d - Sigstore transparency entry: 1510791780
- Sigstore integration time:
-
Permalink:
blahovec-labs/yfinance-bigquery@d5c90746167d4eed8bc75bc3e016bf01ab25524b -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/blahovec-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d5c90746167d4eed8bc75bc3e016bf01ab25524b -
Trigger Event:
push
-
Statement type: