Skip to main content

GCP Pipeline Reference: Mainframe Segment Transform — CDP → fixed-width GCS segment files

Project description

mainframe-segment-transform

Deployment type: Cloud Dataflow (Apache Beam) Layer: CDP → GCS outbound Pattern: Read cdp_generic.customer_risk_profile → write fixed-width segment files to GCS

Overview

This pipeline is the final stage of the data platform. It reads the fully-enriched customer_risk_profile CDP table and produces fixed-width, 200-char segment files in GCS that can be consumed by downstream mainframe systems.

cdp_generic.customer_risk_profile
              │
    (Dataflow / Apache Beam)
              │
gs://{bucket}/segments/{run_id}/ACTIVE_APPROVED/segment-*.txt
gs://{bucket}/segments/{run_id}/DECLINED/segment-*.txt
gs://{bucket}/segments/{run_id}/REFERRED/segment-*.txt
gs://{bucket}/segments/{run_id}/PENDING/segment-*.txt

Segment File Format

Each output file contains one record per line, exactly 200 characters wide:

Field Width Format
segment_type 4 ACTI / DECL / REFR / PEND
customer_id 20 left-justified
account_id 20 left-justified
current_balance 15 right-justified, 2 d.p.
risk_score 6 right-justified integer
decision_outcome 10 APPROVED / DECLINED / REFERRED
facility_status 12 left-justified
loan_amount 15 right-justified, 2 d.p.
interest_rate 8 right-justified, 4 d.p.
term_months 4 right-justified
cdp_segment 20 left-justified
extract_date 8 YYYYMMDD
filler 58 space-padded reserved

Full Pipeline Position

Mainframe files (GCS landing)
        │
  [data-pipeline-orchestrator]   ← Airflow DAG triggered by .ok file via Pub/Sub
        │
  [original-data-to-bigqueryload]  ← Dataflow: CSV → odp_generic.*
        │
  [bigquery-to-mapped-product]   ← dbt: ODP → fdp_generic.*
        │
  [fdp-to-consumable-product]    ← dbt: FDP JOIN → cdp_generic.customer_risk_profile
        │
  [mainframe-segment-transform]  ← Dataflow: CDP → GCS fixed-width segment files
        │
  GCS segments bucket (for mainframe)

Running Locally

# Setup venv
./scripts/setup_deployment_venv.sh mainframe-segment-transform
source deployments/mainframe-segment-transform/venv/bin/activate

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id test_$(date +%Y%m%d_%H%M%S) \
    --runner DirectRunner

Dataflow Execution

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id prod_$(date +%Y%m%d_%H%M%S) \
    --runner DataflowRunner \
    --region europe-west2 \
    --temp_location gs://joseph-antony-aruja-generic-dev-temp/dataflow-temp

GCS Output

gs://{PROJECT_ID}-generic-{ENV}-segments/
  segments/
    {run_id}/
      ACTIVE_APPROVED/
        segment-00-of-01.txt
      DECLINED/
        segment-00-of-01.txt
      REFERRED/
        segment-00-of-01.txt
      PENDING/
        segment-00-of-01.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_segment_transform-1.0.29.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.29.tar.gz.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.29.tar.gz
Algorithm Hash digest
SHA256 5d0424c237a547d81faf8a0087a51ecc4609c23aa00534d54916322b3ef09580
MD5 eae56ff18531ad9728a7f4a32abb25f1
BLAKE2b-256 881dd5b94324649c7dec9f1cfb8de64cb04f40baff8b9df9d3a03f4a72859d7f

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.29-py3-none-any.whl.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.29-py3-none-any.whl
Algorithm Hash digest
SHA256 2f6cc32d89f0d3c6a3364288aa11dc42ff7d881ce9b837d4f475c10a8745f583
MD5 a495944e7d8a6c55622e89f3ce26640a
BLAKE2b-256 3cd5f3fbffd8e6fc65b369419eb5f6d711578baff18b6a94499b88394a14dc63

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page