ETL pipeline for CA Biositing project
Project description
CA Biositing Pipeline
ETL pipeline for the CA Biositing project — extracting biomass feedstock data from Google Sheets and external sources, transforming it with pandas, and loading it into PostgreSQL.
Workflows are orchestrated with Prefect and share
database models from the companion
ca-biositing-datamodels
package.
Installation
pip install ca-biositing-pipeline
Quick Start
from ca_biositing.pipeline.flows.primary_ag_product import primary_ag_product_flow
# Run the primary agricultural product ETL flow
primary_ag_product_flow()
What's Included
- Extract — Pull data from Google Sheets, shapefiles, and public datasets (USDA Census/Survey, LandIQ, Billion Ton)
- Transform — Clean and reshape with pandas and pyjanitor
- Load — Upsert into PostgreSQL with foreign-key resolution
- Flows — Prefect flows combining extract/transform/load steps
Key Dependencies
ca-biositing-datamodels— shared database models- Prefect — workflow orchestration
- pandas — data manipulation
- gspread — Google Sheets integration
- GeoPandas — geospatial data handling
Links
Contributors
Acknowledgement
We acknowledge software engineering support from the University of Washington Scientific Software Engineering Center (SSEC), as part of the Schmidt Sciences Virtual Institute for Scientific Software (VISS).
License
CA Biositing Pipeline is licensed under the open source BSD 3-Clause License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ca_biositing_pipeline-2026.4.6.tar.gz.
File metadata
- Download URL: ca_biositing_pipeline-2026.4.6.tar.gz
- Upload date:
- Size: 515.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d87e3d841baa9be8da5cd6bb64b957d62a27b753221a18fb7054f05900d94368
|
|
| MD5 |
e71edc4a42476e4cda5d9b4cb34d3fe6
|
|
| BLAKE2b-256 |
e7d85985834b02c03ef57aff8a770d67d7c30b7bc12cdf1dce6354fedefc4390
|
Provenance
The following attestation bundles were made for ca_biositing_pipeline-2026.4.6.tar.gz:
Publisher:
cd.yml on sustainability-software-lab/ca-biositing
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ca_biositing_pipeline-2026.4.6.tar.gz -
Subject digest:
d87e3d841baa9be8da5cd6bb64b957d62a27b753221a18fb7054f05900d94368 - Sigstore transparency entry: 1244157938
- Sigstore integration time:
-
Permalink:
sustainability-software-lab/ca-biositing@e3ed4de53e9b36533072e96de41a084c9dc786c0 -
Branch / Tag:
refs/tags/v2026.4.6 - Owner: https://github.com/sustainability-software-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@e3ed4de53e9b36533072e96de41a084c9dc786c0 -
Trigger Event:
release
-
Statement type:
File details
Details for the file ca_biositing_pipeline-2026.4.6-py3-none-any.whl.
File metadata
- Download URL: ca_biositing_pipeline-2026.4.6-py3-none-any.whl
- Upload date:
- Size: 513.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8b3be3693be95bcd9891e4ba35d77571fd94951cbb1a4949c7ccd3050ba42395
|
|
| MD5 |
dc57110de21ddd4ecb9cb90f6c4d44aa
|
|
| BLAKE2b-256 |
7973b05840445421801b91e1abaf856652eda18fc85580ff8078cbc6a6e46063
|
Provenance
The following attestation bundles were made for ca_biositing_pipeline-2026.4.6-py3-none-any.whl:
Publisher:
cd.yml on sustainability-software-lab/ca-biositing
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ca_biositing_pipeline-2026.4.6-py3-none-any.whl -
Subject digest:
8b3be3693be95bcd9891e4ba35d77571fd94951cbb1a4949c7ccd3050ba42395 - Sigstore transparency entry: 1244157945
- Sigstore integration time:
-
Permalink:
sustainability-software-lab/ca-biositing@e3ed4de53e9b36533072e96de41a084c9dc786c0 -
Branch / Tag:
refs/tags/v2026.4.6 - Owner: https://github.com/sustainability-software-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cd.yml@e3ed4de53e9b36533072e96de41a084c9dc786c0 -
Trigger Event:
release
-
Statement type: