Skip to main content

GCP Pipeline Reference: Mainframe Segment Transform — CDP → fixed-width GCS segment files

Project description

mainframe-segment-transform

Deployment type: Cloud Dataflow (Apache Beam) Layer: CDP → GCS outbound Pattern: Read cdp_generic.customer_risk_profile → write fixed-width segment files to GCS

Overview

This pipeline is the final stage of the data platform. It reads the fully-enriched customer_risk_profile CDP table and produces fixed-width, 200-char segment files in GCS that can be consumed by downstream mainframe systems.

cdp_generic.customer_risk_profile
              │
    (Dataflow / Apache Beam)
              │
gs://{bucket}/segments/{run_id}/ACTIVE_APPROVED/segment-*.txt
gs://{bucket}/segments/{run_id}/DECLINED/segment-*.txt
gs://{bucket}/segments/{run_id}/REFERRED/segment-*.txt
gs://{bucket}/segments/{run_id}/PENDING/segment-*.txt

Segment File Format

Each output file contains one record per line, exactly 200 characters wide:

Field Width Format
segment_type 4 ACTI / DECL / REFR / PEND
customer_id 20 left-justified
account_id 20 left-justified
current_balance 15 right-justified, 2 d.p.
risk_score 6 right-justified integer
decision_outcome 10 APPROVED / DECLINED / REFERRED
facility_status 12 left-justified
loan_amount 15 right-justified, 2 d.p.
interest_rate 8 right-justified, 4 d.p.
term_months 4 right-justified
cdp_segment 20 left-justified
extract_date 8 YYYYMMDD
filler 58 space-padded reserved

Full Pipeline Position

Mainframe files (GCS landing)
        │
  [data-pipeline-orchestrator]   ← Airflow DAG triggered by .ok file via Pub/Sub
        │
  [original-data-to-bigqueryload]  ← Dataflow: CSV → odp_generic.*
        │
  [bigquery-to-mapped-product]   ← dbt: ODP → fdp_generic.*
        │
  [fdp-to-consumable-product]    ← dbt: FDP JOIN → cdp_generic.customer_risk_profile
        │
  [mainframe-segment-transform]  ← Dataflow: CDP → GCS fixed-width segment files
        │
  GCS segments bucket (for mainframe)

Running Locally

# Setup venv
./scripts/setup_deployment_venv.sh mainframe-segment-transform
source deployments/mainframe-segment-transform/venv/bin/activate

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id test_$(date +%Y%m%d_%H%M%S) \
    --runner DirectRunner

Dataflow Execution

python deployments/mainframe-segment-transform/src/cdp_example/main.py \
    --project joseph-antony-aruja \
    --cdp_dataset cdp_generic \
    --cdp_table customer_risk_profile \
    --output_bucket joseph-antony-aruja-generic-dev-segments \
    --run_id prod_$(date +%Y%m%d_%H%M%S) \
    --runner DataflowRunner \
    --region europe-west2 \
    --temp_location gs://joseph-antony-aruja-generic-dev-temp/dataflow-temp

GCS Output

gs://{PROJECT_ID}-generic-{ENV}-segments/
  segments/
    {run_id}/
      ACTIVE_APPROVED/
        segment-00-of-01.txt
      DECLINED/
        segment-00-of-01.txt
      REFERRED/
        segment-00-of-01.txt
      PENDING/
        segment-00-of-01.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gcp_pipeline_ref_segment_transform-1.0.10.tar.gz (5.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.10.tar.gz.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.10.tar.gz
Algorithm Hash digest
SHA256 51c210d5573ca49c8dbd046f34d1d1b17a3e4668065bfabc139cbf096c1ca1ea
MD5 a50df0399712962fba8a5bc490621688
BLAKE2b-256 7271265aa96101156d88b5a86b8bd86642ffc4a9f4dd4c62eaf9fd84689e078a

See more details on using hashes here.

File details

Details for the file gcp_pipeline_ref_segment_transform-1.0.10-py3-none-any.whl.

File metadata

File hashes

Hashes for gcp_pipeline_ref_segment_transform-1.0.10-py3-none-any.whl
Algorithm Hash digest
SHA256 25c980ed25b903d81a9b7e69ab71f0363e3300aa9410b790732e65afe9e9c73e
MD5 3bab519dbff7e00775e7e986ec96fec6
BLAKE2b-256 17889ccef3e86e8c3e07908d4162e93a51393ce7125b59d2ca5f3e3239d91013

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page