Skip to main content

ETL pipeline for CA Biositing project

Project description

CA Biositing Pipeline

ETL pipeline for the CA Biositing project — extracting biomass feedstock data from Google Sheets and external sources, transforming it with pandas, and loading it into PostgreSQL.

Workflows are orchestrated with Prefect and share database models from the companion ca-biositing-datamodels package.

Installation

pip install ca-biositing-pipeline

Quick Start

from ca_biositing.pipeline.flows.primary_ag_product import primary_ag_product_flow

# Run the primary agricultural product ETL flow
primary_ag_product_flow()

What's Included

  • Extract — Pull data from Google Sheets, shapefiles, and public datasets (USDA Census/Survey, LandIQ, Billion Ton)
  • Transform — Clean and reshape with pandas and pyjanitor
  • Load — Upsert into PostgreSQL with foreign-key resolution
  • Flows — Prefect flows combining extract/transform/load steps

Key Dependencies

Links

Contributors

Contributors

Acknowledgement

We acknowledge software engineering support from the University of Washington Scientific Software Engineering Center (SSEC), as part of the Schmidt Sciences Virtual Institute for Scientific Software (VISS).

License

CA Biositing Pipeline is licensed under the open source BSD 3-Clause License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ca_biositing_pipeline-2026.4.6.tar.gz (515.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ca_biositing_pipeline-2026.4.6-py3-none-any.whl (513.6 kB view details)

Uploaded Python 3

File details

Details for the file ca_biositing_pipeline-2026.4.6.tar.gz.

File metadata

  • Download URL: ca_biositing_pipeline-2026.4.6.tar.gz
  • Upload date:
  • Size: 515.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ca_biositing_pipeline-2026.4.6.tar.gz
Algorithm Hash digest
SHA256 d87e3d841baa9be8da5cd6bb64b957d62a27b753221a18fb7054f05900d94368
MD5 e71edc4a42476e4cda5d9b4cb34d3fe6
BLAKE2b-256 e7d85985834b02c03ef57aff8a770d67d7c30b7bc12cdf1dce6354fedefc4390

See more details on using hashes here.

Provenance

The following attestation bundles were made for ca_biositing_pipeline-2026.4.6.tar.gz:

Publisher: cd.yml on sustainability-software-lab/ca-biositing

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ca_biositing_pipeline-2026.4.6-py3-none-any.whl.

File metadata

File hashes

Hashes for ca_biositing_pipeline-2026.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8b3be3693be95bcd9891e4ba35d77571fd94951cbb1a4949c7ccd3050ba42395
MD5 dc57110de21ddd4ecb9cb90f6c4d44aa
BLAKE2b-256 7973b05840445421801b91e1abaf856652eda18fc85580ff8078cbc6a6e46063

See more details on using hashes here.

Provenance

The following attestation bundles were made for ca_biositing_pipeline-2026.4.6-py3-none-any.whl:

Publisher: cd.yml on sustainability-software-lab/ca-biositing

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page