Skip to main content

GCP Pipeline Reference: Generic FDP Transformation — dbt models for ODP to FDP (JOIN + MAP patterns)

Project description

Generic Transformation

Unit 2 of Generic 3-Unit Deployment

FDP Transformation - dbt models for ODP → FDP transformation.


Flow Diagram

                         Generic TRANSFORMATION FLOW
                         ──────────────────────

  BigQuery ODP                    dbt                      BigQuery FDP
  ────────────                    ───                      ────────────

  odp_generic.customers ─────┐
                        │    ┌─────────────────┐
  odp_generic.accounts  ─────┼───►│   JOIN Logic    │────────► fdp_generic.event_transaction_excess
                        │    └─────────────────┘
                        │
  odp_generic.decision  ─────┼───────────────────────────────► fdp_generic.portfolio_account_excess
                        │
  odp_generic.applications ──┼───────────────────────────────► fdp_generic.portfolio_account_facility
                        │
                        └───────────────────────

Pattern

MULTI-TARGET:

  1. JOIN: 2 ODP sources (customers, accounts) → 1 FDP target (event_transaction_excess)
  2. MAP: 1 ODP source (decision) → 1 FDP target (portfolio_account_excess)
  3. MAP: 1 ODP source (applications) → 1 FDP target (portfolio_account_facility)
Step Description
1 Staging models clean and type-cast raw ODP data
2 add_audit_columns macro injects run_id and source_file
3 mask_pii macro applies environment-aware masking to sensitive fields
4 event_transaction_excess performs INNER JOIN between Customers and Accounts
5 portfolio_account_excess maps Decision ODP 1:1 to FDP

Data Mapping

Source Table Key Fields
odp_generic.customers customer_id, ssn, first_name, last_name, dob, status
odp_generic.accounts account_id, customer_id, account_type, balance, open_date
odp_generic.decision decision_id, customer_id, application_id, decision_code, score, decision_date
odp_generic.applications application_id, customer_id, loan_amount, interest_rate, term_months, application_date, status, event_type, account_type
Target Table Description
fdp_generic.event_transaction_excess Joined customer-account view
fdp_generic.portfolio_account_excess Decision-based portfolio view
fdp_generic.portfolio_account_facility Applications-based facility view

Components

Directory Purpose
dbt/models/staging/generic/ Staging models (clean raw data)
dbt/models/fdp/ FDP models (JOIN and MAP logic)

Key files

Layer File
Staging stg_customers.sql
Staging stg_accounts.sql
Staging stg_decision.sql
Staging stg_applications.sql
FDP event_transaction_excess.sql
FDP portfolio_account_excess.sql
FDP portfolio_account_facility.sql

Library-Driven Ease of Use

The Generic transformation unit uses the gcp-pipeline-transform library to ensure data privacy and lineage with zero local macro development:

  1. Zero-Bleed PII Masking: Uses {{ mask_pii(column, 'SSN') }}. The library automatically applies the correct mask (Full in Prod, Partial in Staging) based on the environment.
  2. Automated Lineage: Uses {{ add_audit_columns() }} to inject run_id and source_file variables, maintaining the E2E lineage established in the ingestion layer.
  3. Metadata Enrichment: Replaces hardcoded business logic with generic library macros that interpret rules from the EntitySchema.

How to Replicate this JOIN Transformation (3-to-1)

To create a new transformation unit that joins multiple entities, follow the Creating New Deployment Guide.

Key steps for this JOIN pattern:

  1. Register Library: Point your dbt_project.yml to the gcp-pipeline-transform macro paths.
  2. Staging Models: Create views for your ODP tables. Use add_audit_columns for consistency.
  3. FDP Models: Implement your LEFT JOIN logic. Apply mask_pii to all sensitive fields.
  4. Governance: Run validate_no_pii_in_export in your CI/CD to prevent leakage.

Infrastructure & Configurations

Google Cloud Resources

This deployment requires the following GCP infrastructure, provisioned via Terraform:

  • Data Warehouse: BigQuery datasets odp_generic (source) and fdp_generic (target).
  • Processing: dbt (running on Cloud Composer or as a standalone process) for executing transformations.

For detailed infrastructure definitions, see infrastructure/terraform/systems/generic/transformation/.

dbt Configuration (dbt_project.yml)

The transformation behavior is controlled by variables and configurations in dbt_project.yml:

Variable Description Default / Source
gcp_project_id Target GCP Project GCP_PROJECT_ID env var
source_dataset Source ODP dataset odp_generic
staging_dataset Intermediate staging dataset stg_generic
fdp_dataset Target FDP dataset fdp_generic
marts_dataset Marts dataset marts_generic
analytics_dataset Analytics dataset analytics_generic
extract_date Date of data extract null (optional filter)
generic_entities List of entities to process ['customers', 'accounts', 'decision', 'applications']
masking_level PII masking strategy (FULL, PARTIAL, NONE) AUTO

Technology Stack & Documentation


Dependencies

Library Purpose
dbt-bigquery dbt adapter for BigQuery
gcp-pipeline-transform Shared macros (audit columns)

Execution & Testing

1. Local Development Setup

Initialize the virtual environment:

./scripts/setup_deployment_venv.sh bigquery-to-mapped-product
source deployments/bigquery-to-mapped-product/venv/bin/activate

2. Local dbt Execution

Run dbt models locally against the development BigQuery dataset:

cd dbt
dbt run --profiles-dir . --target dev

3. Data Quality Validation

Run dbt tests to verify transformation logic and PII masking:

dbt test --profiles-dir . --target dev

4. Governance Verification

Use the library macro to ensure no unmasked PII exists in your models before deployment:

{{ validate_no_pii_in_export('fdp_generic.event_transaction_excess') }}

5. Cloud Execution

In production, this unit is triggered by the generic_odp_load_dag once ingestion is successful. The transformation is executed via a BashOperator running dbt run.


SQL Example

-- fdp_generic.event_transaction_excess
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    a.account_id,
    a.current_balance,
    -- Audit columns
    c._run_id,
    CURRENT_TIMESTAMP() as _transformed_at
FROM {{ ref('stg_generic_customers') }} c
JOIN {{ ref('stg_generic_accounts') }} a 
    ON c.customer_id = a.customer_id

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_transform-1.0.13.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gcp_pipeline_ref_transform-1.0.13-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file gcp_pipeline_ref_transform-1.0.13.tar.gz.

File metadata

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.13.tar.gz
Algorithm Hash digest
SHA256 720e99f7cb80a44a8c7b5869cb6cf52261888120fffdecfdbb04b4447dff7504
MD5 2a77326374036e4268226ba0cf75bdf5
BLAKE2b-256 270cc6a7f0b1066b7e143019048757ec352b25041962f4edc1d8e1b0f3b8ae90

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_transform-1.0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 2952e302634bc398cc66f67733462e3c2f74b2fce7deefeb1ed70cb80d998118
MD5 304f65f9978dea8f477b7e600a5dc6ce
BLAKE2b-256 137190990f503369b6053f4827d8ec05328601804f84a9a1c3ad4754ee69e12a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page