GCP Pipeline Reference: Generic FDP Transformation — dbt models for ODP to FDP (JOIN + MAP patterns)

These details have not been verified by PyPI

Project links

Project description

Generic Transformation

Unit 2 of Generic 3-Unit Deployment

FDP Transformation - dbt models for ODP → FDP transformation.

Flow Diagram

                         Generic TRANSFORMATION FLOW
                         ──────────────────────

  BigQuery ODP                    dbt                      BigQuery FDP
  ────────────                    ───                      ────────────

  odp_generic.customers ─────┐
                        │    ┌─────────────────┐
  odp_generic.accounts  ─────┼───►│   JOIN Logic    │────────► fdp_generic.event_transaction_excess
                        │    └─────────────────┘
                        │
  odp_generic.decision  ─────┼───────────────────────────────► fdp_generic.portfolio_account_excess
                        │
  odp_generic.applications ──┼───────────────────────────────► fdp_generic.portfolio_account_facility
                        │
                        └───────────────────────

Pattern

MULTI-TARGET:

JOIN: 2 ODP sources (customers, accounts) → 1 FDP target (event_transaction_excess)
MAP: 1 ODP source (decision) → 1 FDP target (portfolio_account_excess)
MAP: 1 ODP source (applications) → 1 FDP target (portfolio_account_facility)

Step	Description
1	Staging models clean and type-cast raw ODP data
2	`data_quality_check` macro validates data integrity
3	`incremental_strategy` macro handles incremental load logic
4	`event_transaction_excess` performs `INNER JOIN` between Customers and Accounts
5	`portfolio_account_excess` maps Decision ODP 1:1 to FDP

Data Mapping

Source Table	Key Fields
`odp_generic.customers`	customer_id, ssn, first_name, last_name, dob, status
`odp_generic.accounts`	account_id, customer_id, account_type, balance, open_date
`odp_generic.decision`	decision_id, customer_id, application_id, decision_code, score, decision_date
`odp_generic.applications`	application_id, customer_id, loan_amount, interest_rate, term_months, application_date, status, event_type, account_type

Target Table	Description
`fdp_generic.event_transaction_excess`	Joined customer-account view
`fdp_generic.portfolio_account_excess`	Decision-based portfolio view
`fdp_generic.portfolio_account_facility`	Applications-based facility view

Components

Directory	Purpose
`dbt/models/staging/generic/`	Staging views (clean raw ODP data)
`dbt/models/fdp/`	FDP incremental models (JOIN and MAP logic)
`dbt/macros/`	Custom macros (schema routing, data quality, incremental strategy)
`dbt/models/marts/`	Marts models — placeholder (empty)
`dbt/models/analytics/`	Analytics models — placeholder (empty)

Key files

Layer	File	Purpose
Staging	`stg_generic_customers.sql`	Customer data view
Staging	`stg_generic_accounts.sql`	Account data view
Staging	`stg_generic_decision.sql`	Decision data view
Staging	`stg_generic_applications.sql`	Applications data view
Staging	`_generic_sources.yml`	Source definitions
FDP	`event_transaction_excess.sql`	JOIN: customers + accounts
FDP	`portfolio_account_excess.sql`	MAP: decision → portfolio
FDP	`portfolio_account_facility.sql`	MAP: applications → facility
FDP	`_fdp_generic_models.yml`	Model schemas and tests
Macro	`generate_schema_name.sql`	Routes models to Terraform-managed datasets
Macro	`data_quality_check.sql`	Data quality validation
Macro	`incremental_strategy.sql`	Incremental load logic

Library-Driven Ease of Use

The Generic transformation unit uses the gcp-pipeline-transform library for lineage and audit:

Schema Routing: Uses a custom generate_schema_name macro to route staging models to odp_generic and FDP models to fdp_generic, matching the Terraform-managed datasets.
Automated Lineage: Audit columns (_run_id, _extract_date, _transformed_at) are preserved from ingestion through to FDP, maintaining end-to-end traceability.
Incremental Processing: All FDP models use merge strategy with on_schema_change='append_new_columns' for efficient incremental loads.

Note on PII Masking: The mask_pii macro from gcp-pipeline-transform is available but not currently applied in the FDP models. PII masking can be re-enabled per-environment when the library's get_masking_level Jinja whitespace issue is resolved.

How to Replicate this JOIN Transformation (3-to-1)

To create a new transformation unit that joins multiple entities, follow the Creating New Deployment Guide.

Key steps for this JOIN pattern:

Register Library: Point your dbt_project.yml to the gcp-pipeline-transform macro paths.
Schema Routing: Create a generate_schema_name macro to map logical schemas to your Terraform-managed datasets.
Staging Models: Create views for your ODP tables. Include all audit columns (_run_id, _extract_date, _processed_at).
FDP Models: Implement your JOIN/MAP logic with incremental materialization and merge strategy.

Infrastructure & Configurations

Google Cloud Resources

This deployment requires the following GCP infrastructure, provisioned via Terraform:

Data Warehouse: BigQuery datasets odp_generic (source) and fdp_generic (target).
Processing: dbt (running on Cloud Composer or as a standalone process) for executing transformations.

For detailed infrastructure definitions, see infrastructure/terraform/systems/generic/transformation/.

dbt Configuration (`dbt_project.yml`)

The transformation behavior is controlled by variables and configurations in dbt_project.yml:

Variable	Description	Default / Source
`gcp_project_id`	Target GCP Project	`GCP_PROJECT_ID` env var
`source_dataset`	Source ODP dataset	`odp_generic`
`staging_dataset`	Intermediate staging dataset	`stg_generic`
`fdp_dataset`	Target FDP dataset	`fdp_generic`
`masking_level`	PII masking strategy (`FULL`, `PARTIAL`, `NONE`)	`NONE`
`extract_date`	Date of data extract	`null` (optional filter)
`all_entities`	List of entities to process	`['customers', 'accounts', 'decision', 'applications']`

Technology Stack & Documentation

Google BigQuery - Serverless data warehouse
dbt (data build tool) - Transformation workflow
dbt-bigquery Adapter - dbt to BigQuery connector
Data Modeling in dbt - Best practices for models

Dependencies

Library	Purpose
`dbt-bigquery`	dbt adapter for BigQuery
`gcp-pipeline-transform`	Shared macros (audit columns)

Execution & Testing

1. Local Development Setup

Initialize the virtual environment:

./scripts/setup_deployment_venv.sh bigquery-to-mapped-product
source deployments/bigquery-to-mapped-product/venv/bin/activate

2. Local dbt Execution

Run dbt models locally against the development BigQuery dataset:

cd dbt
dbt run --profiles-dir . --target dev

3. Data Quality Validation

Run dbt tests to verify transformation logic and PII masking:

dbt test --profiles-dir . --target dev

4. Cloud Execution

In production, this unit is triggered by the generic_transformation_dag once ingestion is successful. The transformation runs via Cloud Build executing dbt run --target int against BigQuery.

SQL Example

-- fdp_generic.event_transaction_excess (simplified)
SELECT
    {{ dbt_utils.generate_surrogate_key(['c.customer_id', 'a.account_id', 'c._extract_date']) }} as event_key,
    c.customer_id,
    c.ssn as ssn_masked,
    c.first_name,
    c.last_name,
    a.account_id,
    a.balance as current_balance,
    -- Audit columns
    c._run_id,
    c._extract_date,
    CURRENT_TIMESTAMP() as _transformed_at
FROM {{ ref('stg_generic_customers') }} c
INNER JOIN {{ ref('stg_generic_accounts') }} a
    ON c.customer_id = a.customer_id

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.29

Mar 24, 2026

1.0.14

Mar 22, 2026

1.0.13

Mar 17, 2026

1.0.11

Mar 15, 2026

1.0.10

Mar 12, 2026

1.0.9

Mar 12, 2026

1.0.8

Mar 11, 2026

1.0.7

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_transform-1.0.29.tar.gz (14.3 kB view details)

Uploaded Mar 24, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl (16.6 kB view details)

Uploaded Mar 24, 2026 Python 3

File details

Details for the file gcp_pipeline_ref_transform-1.0.29.tar.gz.

File metadata

Download URL: gcp_pipeline_ref_transform-1.0.29.tar.gz
Upload date: Mar 24, 2026
Size: 14.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.29.tar.gz
Algorithm	Hash digest
SHA256	`49761d0ba780f80b23e1201882428a6fcf84019b6cf73ad1a4b5980e20b40b3f`
MD5	`07febff6f4c5d3fa261b78c9650f0476`
BLAKE2b-256	`0e2dadacfd7b7b1039ce6225d05cf52c87dba79a27a1e399b7094b20d55d908d`

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl.

File metadata

Download URL: gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl
Upload date: Mar 24, 2026
Size: 16.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4d51691093b76013801ce4655909c4f83f26d8f5f7b4d98aeb08a5eb1394616d`
MD5	`02af65509581ff9b9d15f674a53878cf`
BLAKE2b-256	`54d1fbc829e7768b2872529299d15d53cffcb0a4e94fb5704d11b550af01cb2a`

See more details on using hashes here.

gcp-pipeline-ref-transform 1.0.29

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Generic Transformation

Flow Diagram

Pattern

Data Mapping

Components

Key files

Library-Driven Ease of Use

How to Replicate this JOIN Transformation (3-to-1)

Infrastructure & Configurations

Google Cloud Resources

dbt Configuration (dbt_project.yml)

Technology Stack & Documentation

Dependencies

Execution & Testing

1. Local Development Setup

2. Local dbt Execution

3. Data Quality Validation

4. Cloud Execution

SQL Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

dbt Configuration (`dbt_project.yml`)