Statcast → BigQuery: idempotent ingestion + LLM-friendly docs + Baseball Savant verification
Project description
statcast-bigquery
Idempotent Statcast → BigQuery ingestion, with first-class documentation for SQL/LLM agents and round-trip validation against Baseball Savant.
Install
pip install statcast-bigquery
Quickstart
gcloud auth application-default login
statcast-bigquery sync \
--start 2024-04-01 --end 2024-10-31 \
--table myproject.mydataset.statcast_pitches
Backfill
Backfill historical seasons in resumable chunks:
statcast-bigquery sync \
--start 2015-04-01 --end 2026-05-11 \
--chunk-by year --resume \
--table myproject.mydataset.statcast_pitches
--resume skips chunks already recorded as success in
<dataset>._statcast_ingest_runs. Override with --runs-table if you
want the run log in a sidecar dataset. Re-running with the same
--chunk-by is safe; switching --chunk-by year → month between
runs will re-process (chunks must match exactly to skip).
Documentation
statcast-bigquery docs --format llm > STATCAST_FOR_LLMS.md
Seed your data dictionary
If you maintain a data_dictionary table (one row per column with
business definitions, tags, lineage), you can seed it directly:
statcast-bigquery docs --format dictionary --apply \
--dataset mydataset --table myproject.mydataset.statcast_pitches \
--dictionary-table myproject.shared_ops.data_dictionary
Atomically replaces rows for (dataset, table) only; other entries in
the dictionary table are untouched. Required target schema:
dataset, table, column, dtype, description, business_definition,
owner, tags ARRAY<STRING>, source_system, upstream_lineage_json,
created_at TIMESTAMP, updated_at TIMESTAMP
Verification
statcast-bigquery verify \
--source baseball-savant \
--aggregation player-season \
--metric all --season 2024 \
--table myproject.mydataset.statcast_pitches
MIT licensed. This software does not include or distribute MLB data.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file statcast_bigquery-0.3.1.tar.gz.
File metadata
- Download URL: statcast_bigquery-0.3.1.tar.gz
- Upload date:
- Size: 849.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49ab16f02b8e62f1f7e7fb290931141e64753da059691d4087ed3c5e0df2207e
|
|
| MD5 |
5f9b70755291a7c4a1a7ace74c737497
|
|
| BLAKE2b-256 |
aca17e02f694ee0a23ac3dc385efe815a7b5b287947ec68e4e88e2db608f957e
|
Provenance
The following attestation bundles were made for statcast_bigquery-0.3.1.tar.gz:
Publisher:
release.yml on blahovec-labs/statcast-bigquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statcast_bigquery-0.3.1.tar.gz -
Subject digest:
49ab16f02b8e62f1f7e7fb290931141e64753da059691d4087ed3c5e0df2207e - Sigstore transparency entry: 1508664301
- Sigstore integration time:
-
Permalink:
blahovec-labs/statcast-bigquery@9b96fe1fa89e979fb0daf81b5982770944a08ad1 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/blahovec-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b96fe1fa89e979fb0daf81b5982770944a08ad1 -
Trigger Event:
push
-
Statement type:
File details
Details for the file statcast_bigquery-0.3.1-py3-none-any.whl.
File metadata
- Download URL: statcast_bigquery-0.3.1-py3-none-any.whl
- Upload date:
- Size: 61.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7bad805eddac6520e2b3512f6729db1cd67b92db219d9e23e078bcc1cbdc638a
|
|
| MD5 |
f52c5d02ebb38aaf08a489dd5035d8d2
|
|
| BLAKE2b-256 |
242fba46b2d8c57012b6045e2d7831d6175098708a40223180653b53664c0fed
|
Provenance
The following attestation bundles were made for statcast_bigquery-0.3.1-py3-none-any.whl:
Publisher:
release.yml on blahovec-labs/statcast-bigquery
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
statcast_bigquery-0.3.1-py3-none-any.whl -
Subject digest:
7bad805eddac6520e2b3512f6729db1cd67b92db219d9e23e078bcc1cbdc638a - Sigstore transparency entry: 1508664349
- Sigstore integration time:
-
Permalink:
blahovec-labs/statcast-bigquery@9b96fe1fa89e979fb0daf81b5982770944a08ad1 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/blahovec-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b96fe1fa89e979fb0daf81b5982770944a08ad1 -
Trigger Event:
push
-
Statement type: