GCP Pipeline Reference: Generic FDP Transformation — dbt models for ODP to FDP (JOIN + MAP patterns)
Project description
Generic Transformation
Unit 2 of Generic 3-Unit Deployment
FDP Transformation - dbt models for ODP → FDP transformation.
Flow Diagram
Generic TRANSFORMATION FLOW
──────────────────────
BigQuery ODP dbt BigQuery FDP
──────────── ─── ────────────
odp_generic.customers ─────┐
│ ┌─────────────────┐
odp_generic.accounts ─────┼───►│ JOIN Logic │────────► fdp_generic.event_transaction_excess
│ └─────────────────┘
│
odp_generic.decision ─────┼───────────────────────────────► fdp_generic.portfolio_account_excess
│
└───────────────────────
Pattern
MULTI-TARGET:
- JOIN: 2 ODP sources (customers, accounts) → 1 FDP target (
event_transaction_excess) - MAP: 1 ODP source (decision) → 1 FDP target (
portfolio_account_excess)
| Step | Description |
|---|---|
| 1 | Staging models clean and type-cast raw ODP data |
| 2 | add_audit_columns macro injects run_id and source_file |
| 3 | mask_pii macro applies environment-aware masking to sensitive fields |
| 4 | event_transaction_excess performs INNER JOIN between Customers and Accounts |
| 5 | portfolio_account_excess maps Decision ODP 1:1 to FDP |
Data Mapping
| Source Table | Key Fields |
|---|---|
odp_generic.customers |
customer_id, ssn, first_name, last_name, dob, status |
odp_generic.accounts |
account_id, customer_id, account_type, balance, open_date |
odp_generic.decision |
decision_id, customer_id, application_id, decision_code, score, decision_date |
| Target Table | Description |
|---|---|
fdp_generic.event_transaction_excess |
Joined customer-account view |
fdp_generic.portfolio_account_excess |
Decision-based portfolio view |
Components
| Directory | Purpose |
|---|---|
dbt/models/staging/generic/ |
Staging models (clean raw data) |
dbt/models/fdp/ |
FDP models (JOIN and MAP logic) |
Library-Driven Ease of Use
The Generic transformation unit uses the gcp-pipeline-transform library to ensure data privacy and lineage with zero local macro development:
- Zero-Bleed PII Masking: Uses
{{ mask_pii(column, 'SSN') }}. The library automatically applies the correct mask (Full in Prod, Partial in Staging) based on the environment. - Automated Lineage: Uses
{{ add_audit_columns() }}to injectrun_idandsource_filevariables, maintaining the E2E lineage established in the ingestion layer. - Metadata Enrichment: Replaces hardcoded business logic with generic library macros that interpret rules from the
EntitySchema.
How to Replicate this JOIN Transformation (3-to-1)
To create a new transformation unit that joins multiple entities, follow the Creating New Deployment Guide.
Key steps for this JOIN pattern:
- Register Library: Point your
dbt_project.ymlto thegcp-pipeline-transformmacro paths. - Staging Models: Create views for your ODP tables. Use
add_audit_columnsfor consistency. - FDP Models: Implement your
LEFT JOINlogic. Applymask_piito all sensitive fields. - Governance: Run
validate_no_pii_in_exportin your CI/CD to prevent leakage.
Infrastructure & Configurations
Google Cloud Resources
This deployment requires the following GCP infrastructure, provisioned via Terraform:
- Data Warehouse: BigQuery datasets
odp_generic(source) andfdp_generic(target). - Processing: dbt (running on Cloud Composer or as a standalone process) for executing transformations.
For detailed infrastructure definitions, see infrastructure/terraform/systems/generic/transformation/.
dbt Configuration (dbt_project.yml)
The transformation behavior is controlled by variables and configurations in dbt_project.yml:
| Variable | Description | Default / Source |
|---|---|---|
gcp_project_id |
Target GCP Project | GCP_PROJECT_ID env var |
source_dataset |
Source ODP dataset | odp_generic |
staging_dataset |
Intermediate staging dataset | stg_generic |
fdp_dataset |
Target FDP dataset | fdp_generic |
marts_dataset |
Marts dataset | marts_generic |
analytics_dataset |
Analytics dataset | analytics_generic |
extract_date |
Date of data extract | null (optional filter) |
generic_entities |
List of entities to process | ['customers', 'accounts', 'decision'] |
masking_level |
PII masking strategy (FULL, PARTIAL, NONE) |
AUTO |
Technology Stack & Documentation
- Google BigQuery - Serverless data warehouse
- dbt (data build tool) - Transformation workflow
- dbt-bigquery Adapter - dbt to BigQuery connector
- Data Modeling in dbt - Best practices for models
Dependencies
| Library | Purpose |
|---|---|
dbt-bigquery |
dbt adapter for BigQuery |
gcp-pipeline-transform |
Shared macros (audit columns) |
Execution & Testing
1. Local Development Setup
Initialize the virtual environment:
./scripts/setup_deployment_venv.sh bigquery-to-mapped-product
source deployments/bigquery-to-mapped-product/venv/bin/activate
2. Local dbt Execution
Run dbt models locally against the development BigQuery dataset:
cd dbt
dbt run --profiles-dir . --target dev
3. Data Quality Validation
Run dbt tests to verify transformation logic and PII masking:
dbt test --profiles-dir . --target dev
4. Governance Verification
Use the library macro to ensure no unmasked PII exists in your models before deployment:
{{ validate_no_pii_in_export('fdp_generic.event_transaction_excess') }}
5. Cloud Execution
In production, this unit is triggered by the generic_odp_load_dag once ingestion is successful. The transformation is executed via a BashOperator running dbt run.
SQL Example
-- fdp_generic.event_transaction_excess
SELECT
c.customer_id,
c.first_name,
c.last_name,
a.account_id,
a.current_balance,
-- Audit columns
c._run_id,
CURRENT_TIMESTAMP() as _transformed_at
FROM {{ ref('stg_generic_customers') }} c
JOIN {{ ref('stg_generic_accounts') }} a
ON c.customer_id = a.customer_id
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gcp_pipeline_ref_transform-1.0.7.tar.gz.
File metadata
- Download URL: gcp_pipeline_ref_transform-1.0.7.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4fb84d1d6226e8bb9e7c007dcbfc330f8d09a404d20841a7383db56fe17f7990
|
|
| MD5 |
6012e40aead01705546d85c66ce82b2a
|
|
| BLAKE2b-256 |
07221d93bf9a73b0b0f2a774c8a0216fd6147a161c799ce6b354e9cd0b227c6e
|
File details
Details for the file gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl.
File metadata
- Download URL: gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl
- Upload date:
- Size: 14.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5d7bfdd1eaa2afc897afd713965173a8d0009d812a1793a076ef981282bf106
|
|
| MD5 |
969836b5dae8d4abe474a529b4f7fd4d
|
|
| BLAKE2b-256 |
6a2732b99ce77ec5a400c99ab94a621ab51137a5f8a263395c9f46c973c4fc2e
|