GCP Pipeline Reference: Generic FDP Transformation — dbt models for ODP to FDP (JOIN + MAP patterns)

These details have not been verified by PyPI

Project links

Project description

Generic Transformation

Unit 2 of Generic 3-Unit Deployment

FDP Transformation - dbt models for ODP → FDP transformation.

Flow Diagram

                         Generic TRANSFORMATION FLOW
                         ──────────────────────

  BigQuery ODP                    dbt                      BigQuery FDP
  ────────────                    ───                      ────────────

  odp_generic.customers ─────┐
                        │    ┌─────────────────┐
  odp_generic.accounts  ─────┼───►│   JOIN Logic    │────────► fdp_generic.event_transaction_excess
                        │    └─────────────────┘
                        │
  odp_generic.decision  ─────┼───────────────────────────────► fdp_generic.portfolio_account_excess
                        │
                        └───────────────────────

Pattern

MULTI-TARGET:

JOIN: 2 ODP sources (customers, accounts) → 1 FDP target (event_transaction_excess)
MAP: 1 ODP source (decision) → 1 FDP target (portfolio_account_excess)

Step	Description
1	Staging models clean and type-cast raw ODP data
2	`add_audit_columns` macro injects `run_id` and `source_file`
3	`mask_pii` macro applies environment-aware masking to sensitive fields
4	`event_transaction_excess` performs `INNER JOIN` between Customers and Accounts
5	`portfolio_account_excess` maps Decision ODP 1:1 to FDP

Data Mapping

Source Table	Key Fields
`odp_generic.customers`	customer_id, ssn, first_name, last_name, dob, status
`odp_generic.accounts`	account_id, customer_id, account_type, balance, open_date
`odp_generic.decision`	decision_id, customer_id, application_id, decision_code, score, decision_date

Target Table	Description
`fdp_generic.event_transaction_excess`	Joined customer-account view
`fdp_generic.portfolio_account_excess`	Decision-based portfolio view

Components

Directory	Purpose
`dbt/models/staging/generic/`	Staging models (clean raw data)
`dbt/models/fdp/`	FDP models (JOIN and MAP logic)

Library-Driven Ease of Use

The Generic transformation unit uses the gcp-pipeline-transform library to ensure data privacy and lineage with zero local macro development:

Zero-Bleed PII Masking: Uses {{ mask_pii(column, 'SSN') }}. The library automatically applies the correct mask (Full in Prod, Partial in Staging) based on the environment.
Automated Lineage: Uses {{ add_audit_columns() }} to inject run_id and source_file variables, maintaining the E2E lineage established in the ingestion layer.
Metadata Enrichment: Replaces hardcoded business logic with generic library macros that interpret rules from the EntitySchema.

How to Replicate this JOIN Transformation (3-to-1)

To create a new transformation unit that joins multiple entities, follow the Creating New Deployment Guide.

Key steps for this JOIN pattern:

Register Library: Point your dbt_project.yml to the gcp-pipeline-transform macro paths.
Staging Models: Create views for your ODP tables. Use add_audit_columns for consistency.
FDP Models: Implement your LEFT JOIN logic. Apply mask_pii to all sensitive fields.
Governance: Run validate_no_pii_in_export in your CI/CD to prevent leakage.

Infrastructure & Configurations

Google Cloud Resources

This deployment requires the following GCP infrastructure, provisioned via Terraform:

Data Warehouse: BigQuery datasets odp_generic (source) and fdp_generic (target).
Processing: dbt (running on Cloud Composer or as a standalone process) for executing transformations.

For detailed infrastructure definitions, see infrastructure/terraform/systems/generic/transformation/.

dbt Configuration (`dbt_project.yml`)

The transformation behavior is controlled by variables and configurations in dbt_project.yml:

Variable	Description	Default / Source
`gcp_project_id`	Target GCP Project	`GCP_PROJECT_ID` env var
`source_dataset`	Source ODP dataset	`odp_generic`
`staging_dataset`	Intermediate staging dataset	`stg_generic`
`fdp_dataset`	Target FDP dataset	`fdp_generic`
`marts_dataset`	Marts dataset	`marts_generic`
`analytics_dataset`	Analytics dataset	`analytics_generic`
`extract_date`	Date of data extract	`null` (optional filter)
`generic_entities`	List of entities to process	`['customers', 'accounts', 'decision']`
`masking_level`	PII masking strategy (`FULL`, `PARTIAL`, `NONE`)	`AUTO`

Technology Stack & Documentation

Google BigQuery - Serverless data warehouse
dbt (data build tool) - Transformation workflow
dbt-bigquery Adapter - dbt to BigQuery connector
Data Modeling in dbt - Best practices for models

Dependencies

Library	Purpose
`dbt-bigquery`	dbt adapter for BigQuery
`gcp-pipeline-transform`	Shared macros (audit columns)

Execution & Testing

1. Local Development Setup

Initialize the virtual environment:

./scripts/setup_deployment_venv.sh bigquery-to-mapped-product
source deployments/bigquery-to-mapped-product/venv/bin/activate

2. Local dbt Execution

Run dbt models locally against the development BigQuery dataset:

cd dbt
dbt run --profiles-dir . --target dev

3. Data Quality Validation

Run dbt tests to verify transformation logic and PII masking:

dbt test --profiles-dir . --target dev

4. Governance Verification

Use the library macro to ensure no unmasked PII exists in your models before deployment:

{{ validate_no_pii_in_export('fdp_generic.event_transaction_excess') }}

5. Cloud Execution

In production, this unit is triggered by the generic_odp_load_dag once ingestion is successful. The transformation is executed via a BashOperator running dbt run.

SQL Example

-- fdp_generic.event_transaction_excess
SELECT
    c.customer_id,
    c.first_name,
    c.last_name,
    a.account_id,
    a.current_balance,
    -- Audit columns
    c._run_id,
    CURRENT_TIMESTAMP() as _transformed_at
FROM {{ ref('stg_generic_customers') }} c
JOIN {{ ref('stg_generic_accounts') }} a 
    ON c.customer_id = a.customer_id

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.29

Mar 24, 2026

1.0.14

Mar 22, 2026

1.0.13

Mar 17, 2026

1.0.11

Mar 15, 2026

1.0.10

Mar 12, 2026

1.0.9

Mar 12, 2026

1.0.8

Mar 11, 2026

This version

1.0.7

Mar 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_transform-1.0.7.tar.gz (12.8 kB view details)

Uploaded Mar 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl (14.8 kB view details)

Uploaded Mar 11, 2026 Python 3

File details

Details for the file gcp_pipeline_ref_transform-1.0.7.tar.gz.

File metadata

Download URL: gcp_pipeline_ref_transform-1.0.7.tar.gz
Upload date: Mar 11, 2026
Size: 12.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.7.tar.gz
Algorithm	Hash digest
SHA256	`4fb84d1d6226e8bb9e7c007dcbfc330f8d09a404d20841a7383db56fe17f7990`
MD5	`6012e40aead01705546d85c66ce82b2a`
BLAKE2b-256	`07221d93bf9a73b0b0f2a774c8a0216fd6147a161c799ce6b354e9cd0b227c6e`

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl.

File metadata

Download URL: gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl
Upload date: Mar 11, 2026
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for gcp_pipeline_ref_transform-1.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5d7bfdd1eaa2afc897afd713965173a8d0009d812a1793a076ef981282bf106`
MD5	`969836b5dae8d4abe474a529b4f7fd4d`
BLAKE2b-256	`6a2732b99ce77ec5a400c99ab94a621ab51137a5f8a263395c9f46c973c4fc2e`

See more details on using hashes here.

gcp-pipeline-ref-transform 1.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Generic Transformation

Flow Diagram

Pattern

Data Mapping

Components

Library-Driven Ease of Use

How to Replicate this JOIN Transformation (3-to-1)

Infrastructure & Configurations

Google Cloud Resources

dbt Configuration (dbt_project.yml)

Technology Stack & Documentation

Dependencies

Execution & Testing

1. Local Development Setup

2. Local dbt Execution

3. Data Quality Validation

4. Governance Verification

5. Cloud Execution

SQL Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

dbt Configuration (`dbt_project.yml`)