Skip to main content

GCP Pipeline Reference: Mainframe Segment Transform — CDP → fixed-width GCS segment files

Project description

mainframe-segment-transform

Deployment type: Cloud Dataflow (Apache Beam) Layer: CDP → GCS outbound Pattern: Read cdp_generic.customer_risk_profile → write fixed-width segment files to GCS

Overview

This pipeline is the final stage of the data platform. It reads the fully-enriched customer_risk_profile CDP table and produces fixed-width, 200-char segment files in GCS that can be consumed by downstream mainframe systems.

cdp_generic.customer_risk_profile
              │
    (Dataflow / Apache Beam)
              │
gs://{bucket}/segments/{run_id}/ACTIVE_APPROVED/segment-*.txt
gs://{bucket}/segments/{run_id}/DECLINED/segment-*.txt
gs://{bucket}/segments/{run_id}/REFERRED/segment-*.txt
gs://{bucket}/segments/{run_id}/PENDING/segment-*.txt

Segment File Format

Each output file contains one record per line, exactly 200 characters wide:

Field Width Format
segment_type 4 ACTI / DECL / REFR / PEND
customer_id 20 left-justified
account_id 20 left-justified
current_balance 15 right-justified, 2 d.p.
risk_score 6 right-justified integer
decision_outcome 10 APPROVED / DECLINED / REFERRED
facility_status 12 left-justified
loan_amount 15 right-justified, 2 d.p.
interest_rate 8 right-justified, 4 d.p.
term_months 4 right-justified
cdp_segment 20 left-justified
extract_date 8 YYYYMMDD
filler 58 space-padded reserved

Full Pipeline Position

Mainframe files (GCS landing)
        │
  [data-pipeline-orchestrator]   ← Airflow DAG triggered by .ok file via Pub/Sub
        │
  [original-data-to-bigqueryload]  ← Dataflow: CSV → odp_generic.*
        │
  [bigquery-to-mapped-product]   ← dbt: ODP → fdp_generic.*
        │
  [fdp-to-consumable-product]    ← dbt: FDP JOIN → cdp_generic.customer_risk_profile
        │
  [mainframe-segment-transform]  ← Dataflow: CDP → GCS fixed-width segment files
        │
  GCS segments bucket (for mainframe)

Running Locally

# Setup venv
./scripts/setup_deployment_venv.sh mainframe-segment-transform
source deployments/mainframe-segment-transform/venv/bin/activate

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id test_$(date +%Y%m%d_%H%M%S) \
    --runner DirectRunner

Dataflow Execution

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id prod_$(date +%Y%m%d_%H%M%S) \
    --runner DataflowRunner \
    --region europe-west2 \
    --temp_location gs://joseph-antony-aruja-generic-dev-temp/dataflow-temp

GCS Output

gs://{PROJECT_ID}-generic-{ENV}-segments/
  segments/
    {run_id}/
      ACTIVE_APPROVED/
        segment-00-of-01.txt
      DECLINED/
        segment-00-of-01.txt
      REFERRED/
        segment-00-of-01.txt
      PENDING/
        segment-00-of-01.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_segment_transform-1.0.9.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.9.tar.gz.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.9.tar.gz
Algorithm Hash digest
SHA256 792017d583ffe2dfddb051f6c09cc2350afabb932027a30d7644ba490e68b293
MD5 c16d4d6283caf749d7222d7059aa4ee6
BLAKE2b-256 d025828e6ac429301f6df4056096de91b6606c9a5d649527a8514443c00c88bd

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.9-py3-none-any.whl.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 23501b56897a4e58964f17e006307f5be18c74fdbd8340bb6a23cdc16120cb73
MD5 bacf0c397a79649ccee14ecbedd1bff5
BLAKE2b-256 1a3032cc8d30d7116e1611e525fb22a613922283f86c060168edb81542af2694

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page