Skip to main content

GCP Pipeline Reference: Mainframe Segment Transform — CDP → fixed-width GCS segment files

Project description

mainframe-segment-transform

Deployment type: Cloud Dataflow (Apache Beam) Layer: CDP → GCS outbound Pattern: Read cdp_generic.customer_risk_profile → write fixed-width segment files to GCS

Overview

This pipeline is the final stage of the data platform. It reads the fully-enriched customer_risk_profile CDP table and produces fixed-width, 200-char segment files in GCS that can be consumed by downstream mainframe systems.

cdp_generic.customer_risk_profile
              │
    (Dataflow / Apache Beam)
              │
gs://{bucket}/segments/{run_id}/ACTIVE_APPROVED/segment-*.txt
gs://{bucket}/segments/{run_id}/DECLINED/segment-*.txt
gs://{bucket}/segments/{run_id}/REFERRED/segment-*.txt
gs://{bucket}/segments/{run_id}/PENDING/segment-*.txt

Segment File Format

Each output file contains one record per line, exactly 200 characters wide:

Field Width Format
segment_type 4 ACTI / DECL / REFR / PEND
customer_id 20 left-justified
account_id 20 left-justified
current_balance 15 right-justified, 2 d.p.
risk_score 6 right-justified integer
decision_outcome 10 APPROVED / DECLINED / REFERRED
facility_status 12 left-justified
loan_amount 15 right-justified, 2 d.p.
interest_rate 8 right-justified, 4 d.p.
term_months 4 right-justified
cdp_segment 20 left-justified
extract_date 8 YYYYMMDD
filler 58 space-padded reserved

Full Pipeline Position

Mainframe files (GCS landing)
        │
  [data-pipeline-orchestrator]   ← Airflow DAG triggered by .ok file via Pub/Sub
        │
  [original-data-to-bigqueryload]  ← Dataflow: CSV → odp_generic.*
        │
  [bigquery-to-mapped-product]   ← dbt: ODP → fdp_generic.*
        │
  [fdp-to-consumable-product]    ← dbt: FDP JOIN → cdp_generic.customer_risk_profile
        │
  [mainframe-segment-transform]  ← Dataflow: CDP → GCS fixed-width segment files
        │
  GCS segments bucket (for mainframe)

Running Locally

# Setup venv
./scripts/setup_deployment_venv.sh mainframe-segment-transform
source deployments/mainframe-segment-transform/venv/bin/activate

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id test_$(date +%Y%m%d_%H%M%S) \
    --runner DirectRunner

Dataflow Execution

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id prod_$(date +%Y%m%d_%H%M%S) \
    --runner DataflowRunner \
    --region europe-west2 \
    --temp_location gs://joseph-antony-aruja-generic-dev-temp/dataflow-temp

GCS Output

gs://{PROJECT_ID}-generic-{ENV}-segments/
  segments/
    {run_id}/
      ACTIVE_APPROVED/
        segment-00-of-01.txt
      DECLINED/
        segment-00-of-01.txt
      REFERRED/
        segment-00-of-01.txt
      PENDING/
        segment-00-of-01.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_segment_transform-1.0.14.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.14.tar.gz.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.14.tar.gz
Algorithm Hash digest
SHA256 b045d97e179a22561db40b8614bacdc5c2ca5769089b0c1e4827b08867fc7e7f
MD5 6c173671d8a1f9db30795063bce3fce8
BLAKE2b-256 48af3b8bac41e9e7328770b9e554f82e9c6ffcd035fb4b5a4fe29fdff1e806d4

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.14-py3-none-any.whl.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.14-py3-none-any.whl
Algorithm Hash digest
SHA256 d1e97f8bcfbbc15fcd6a264d0e37e51d65bb3505bc030f5b55107ffc8ade35c0
MD5 9cedffbbfe6c9bdf196bf43c99ce6081
BLAKE2b-256 591fad6f4b2f8153d546ca43879948be06412abed7ce7a76923de63707b90fcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page