GCP Pipeline Reference: Generic FDP Transformation — dbt models for ODP to FDP (JOIN + MAP patterns)
Project description
Generic Transformation
Unit 2 of Generic 3-Unit Deployment
FDP Transformation - dbt models for ODP → FDP transformation.
Flow Diagram
Generic TRANSFORMATION FLOW
──────────────────────
BigQuery ODP dbt BigQuery FDP
──────────── ─── ────────────
odp_generic.customers ─────┐
│ ┌─────────────────┐
odp_generic.accounts ─────┼───►│ JOIN Logic │────────► fdp_generic.event_transaction_excess
│ └─────────────────┘
│
odp_generic.decision ─────┼───────────────────────────────► fdp_generic.portfolio_account_excess
│
odp_generic.applications ──┼───────────────────────────────► fdp_generic.portfolio_account_facility
│
└───────────────────────
Pattern
MULTI-TARGET:
- JOIN: 2 ODP sources (customers, accounts) → 1 FDP target (
event_transaction_excess) - MAP: 1 ODP source (decision) → 1 FDP target (
portfolio_account_excess) - MAP: 1 ODP source (applications) → 1 FDP target (
portfolio_account_facility)
| Step | Description |
|---|---|
| 1 | Staging models clean and type-cast raw ODP data |
| 2 | data_quality_check macro validates data integrity |
| 3 | incremental_strategy macro handles incremental load logic |
| 4 | event_transaction_excess performs INNER JOIN between Customers and Accounts |
| 5 | portfolio_account_excess maps Decision ODP 1:1 to FDP |
Data Mapping
| Source Table | Key Fields |
|---|---|
odp_generic.customers |
customer_id, ssn, first_name, last_name, dob, status |
odp_generic.accounts |
account_id, customer_id, account_type, balance, open_date |
odp_generic.decision |
decision_id, customer_id, application_id, decision_code, score, decision_date |
odp_generic.applications |
application_id, customer_id, loan_amount, interest_rate, term_months, application_date, status, event_type, account_type |
| Target Table | Description |
|---|---|
fdp_generic.event_transaction_excess |
Joined customer-account view |
fdp_generic.portfolio_account_excess |
Decision-based portfolio view |
fdp_generic.portfolio_account_facility |
Applications-based facility view |
Components
| Directory | Purpose |
|---|---|
dbt/models/staging/generic/ |
Staging views (clean raw ODP data) |
dbt/models/fdp/ |
FDP incremental models (JOIN and MAP logic) |
dbt/macros/ |
Custom macros (schema routing, data quality, incremental strategy) |
dbt/models/marts/ |
Marts models — placeholder (empty) |
dbt/models/analytics/ |
Analytics models — placeholder (empty) |
Key files
| Layer | File | Purpose |
|---|---|---|
| Staging | stg_generic_customers.sql |
Customer data view |
| Staging | stg_generic_accounts.sql |
Account data view |
| Staging | stg_generic_decision.sql |
Decision data view |
| Staging | stg_generic_applications.sql |
Applications data view |
| Staging | _generic_sources.yml |
Source definitions |
| FDP | event_transaction_excess.sql |
JOIN: customers + accounts |
| FDP | portfolio_account_excess.sql |
MAP: decision → portfolio |
| FDP | portfolio_account_facility.sql |
MAP: applications → facility |
| FDP | _fdp_generic_models.yml |
Model schemas and tests |
| Macro | generate_schema_name.sql |
Routes models to Terraform-managed datasets |
| Macro | data_quality_check.sql |
Data quality validation |
| Macro | incremental_strategy.sql |
Incremental load logic |
Library-Driven Ease of Use
The Generic transformation unit uses the gcp-pipeline-transform library for lineage and audit:
- Schema Routing: Uses a custom
generate_schema_namemacro to route staging models toodp_genericand FDP models tofdp_generic, matching the Terraform-managed datasets. - Automated Lineage: Audit columns (
_run_id,_extract_date,_transformed_at) are preserved from ingestion through to FDP, maintaining end-to-end traceability. - Incremental Processing: All FDP models use
mergestrategy withon_schema_change='append_new_columns'for efficient incremental loads.
Note on PII Masking: The
mask_piimacro fromgcp-pipeline-transformis available but not currently applied in the FDP models. PII masking can be re-enabled per-environment when the library'sget_masking_levelJinja whitespace issue is resolved.
How to Replicate this JOIN Transformation (3-to-1)
To create a new transformation unit that joins multiple entities, follow the Creating New Deployment Guide.
Key steps for this JOIN pattern:
- Register Library: Point your
dbt_project.ymlto thegcp-pipeline-transformmacro paths. - Schema Routing: Create a
generate_schema_namemacro to map logical schemas to your Terraform-managed datasets. - Staging Models: Create views for your ODP tables. Include all audit columns (
_run_id,_extract_date,_processed_at). - FDP Models: Implement your JOIN/MAP logic with
incrementalmaterialization andmergestrategy.
Infrastructure & Configurations
Google Cloud Resources
This deployment requires the following GCP infrastructure, provisioned via Terraform:
- Data Warehouse: BigQuery datasets
odp_generic(source) andfdp_generic(target). - Processing: dbt (running on Cloud Composer or as a standalone process) for executing transformations.
For detailed infrastructure definitions, see infrastructure/terraform/systems/generic/transformation/.
dbt Configuration (dbt_project.yml)
The transformation behavior is controlled by variables and configurations in dbt_project.yml:
| Variable | Description | Default / Source |
|---|---|---|
gcp_project_id |
Target GCP Project | GCP_PROJECT_ID env var |
source_dataset |
Source ODP dataset | odp_generic |
staging_dataset |
Intermediate staging dataset | stg_generic |
fdp_dataset |
Target FDP dataset | fdp_generic |
masking_level |
PII masking strategy (FULL, PARTIAL, NONE) |
NONE |
extract_date |
Date of data extract | null (optional filter) |
all_entities |
List of entities to process | ['customers', 'accounts', 'decision', 'applications'] |
Technology Stack & Documentation
- Google BigQuery - Serverless data warehouse
- dbt (data build tool) - Transformation workflow
- dbt-bigquery Adapter - dbt to BigQuery connector
- Data Modeling in dbt - Best practices for models
Dependencies
| Library | Purpose |
|---|---|
dbt-bigquery |
dbt adapter for BigQuery |
gcp-pipeline-transform |
Shared macros (audit columns) |
Execution & Testing
1. Local Development Setup
Initialize the virtual environment:
./scripts/setup_deployment_venv.sh bigquery-to-mapped-product
source deployments/bigquery-to-mapped-product/venv/bin/activate
2. Local dbt Execution
Run dbt models locally against the development BigQuery dataset:
cd dbt
dbt run --profiles-dir . --target dev
3. Data Quality Validation
Run dbt tests to verify transformation logic and PII masking:
dbt test --profiles-dir . --target dev
4. Cloud Execution
In production, this unit is triggered by the generic_transformation_dag once ingestion is successful. The transformation runs via Cloud Build executing dbt run --target int against BigQuery.
SQL Example
-- fdp_generic.event_transaction_excess (simplified)
SELECT
{{ dbt_utils.generate_surrogate_key(['c.customer_id', 'a.account_id', 'c._extract_date']) }} as event_key,
c.customer_id,
c.ssn as ssn_masked,
c.first_name,
c.last_name,
a.account_id,
a.balance as current_balance,
-- Audit columns
c._run_id,
c._extract_date,
CURRENT_TIMESTAMP() as _transformed_at
FROM {{ ref('stg_generic_customers') }} c
INNER JOIN {{ ref('stg_generic_accounts') }} a
ON c.customer_id = a.customer_id
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gcp_pipeline_ref_transform-1.0.29.tar.gz.
File metadata
- Download URL: gcp_pipeline_ref_transform-1.0.29.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49761d0ba780f80b23e1201882428a6fcf84019b6cf73ad1a4b5980e20b40b3f
|
|
| MD5 |
07febff6f4c5d3fa261b78c9650f0476
|
|
| BLAKE2b-256 |
0e2dadacfd7b7b1039ce6225d05cf52c87dba79a27a1e399b7094b20d55d908d
|
File details
Details for the file gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl.
File metadata
- Download URL: gcp_pipeline_ref_transform-1.0.29-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4d51691093b76013801ce4655909c4f83f26d8f5f7b4d98aeb08a5eb1394616d
|
|
| MD5 |
02af65509581ff9b9d15f674a53878cf
|
|
| BLAKE2b-256 |
54d1fbc829e7768b2872529299d15d53cffcb0a4e94fb5704d11b550af01cb2a
|