GCP Pipeline Reference: Mainframe Segment Transform — CDP → fixed-width GCS segment files
Project description
mainframe-segment-transform
Deployment type: Cloud Dataflow (Apache Beam)
Layer: CDP → GCS outbound
Pattern: Read cdp_generic.customer_risk_profile → write fixed-width segment files to GCS
Overview
This pipeline is the final stage of the data platform. It reads the fully-enriched
customer_risk_profile CDP table and produces fixed-width, 200-char segment files
in GCS that can be consumed by downstream mainframe systems.
cdp_generic.customer_risk_profile
│
(Dataflow / Apache Beam)
│
gs://{bucket}/segments/{run_id}/ACTIVE_APPROVED/segment-*.txt
gs://{bucket}/segments/{run_id}/DECLINED/segment-*.txt
gs://{bucket}/segments/{run_id}/REFERRED/segment-*.txt
gs://{bucket}/segments/{run_id}/PENDING/segment-*.txt
Segment File Format
Each output file contains one record per line, exactly 200 characters wide:
| Field | Width | Format |
|---|---|---|
segment_type |
4 | ACTI / DECL / REFR / PEND |
customer_id |
20 | left-justified |
account_id |
20 | left-justified |
current_balance |
15 | right-justified, 2 d.p. |
risk_score |
6 | right-justified integer |
decision_outcome |
10 | APPROVED / DECLINED / REFERRED |
facility_status |
12 | left-justified |
loan_amount |
15 | right-justified, 2 d.p. |
interest_rate |
8 | right-justified, 4 d.p. |
term_months |
4 | right-justified |
cdp_segment |
20 | left-justified |
extract_date |
8 | YYYYMMDD |
filler |
58 | space-padded reserved |
Full Pipeline Position
Mainframe files (GCS landing)
│
[data-pipeline-orchestrator] ← Airflow DAG triggered by .ok file via Pub/Sub
│
[original-data-to-bigqueryload] ← Dataflow: CSV → odp_generic.*
│
[bigquery-to-mapped-product] ← dbt: ODP → fdp_generic.*
│
[fdp-to-consumable-product] ← dbt: FDP JOIN → cdp_generic.customer_risk_profile
│
[mainframe-segment-transform] ← Dataflow: CDP → GCS fixed-width segment files
│
GCS segments bucket (for mainframe)
Running Locally
# Setup venv
./scripts/setup_deployment_venv.sh mainframe-segment-transform
source deployments/mainframe-segment-transform/venv/bin/activate
python deployments/mainframe-segment-transform/src/cdp_example/main.py \
--project joseph-antony-aruja \
--cdp_dataset cdp_generic \
--cdp_table customer_risk_profile \
--output_bucket joseph-antony-aruja-generic-dev-segments \
--run_id test_$(date +%Y%m%d_%H%M%S) \
--runner DirectRunner
Dataflow Execution
python deployments/mainframe-segment-transform/src/cdp_example/main.py \
--project joseph-antony-aruja \
--cdp_dataset cdp_generic \
--cdp_table customer_risk_profile \
--output_bucket joseph-antony-aruja-generic-dev-segments \
--run_id prod_$(date +%Y%m%d_%H%M%S) \
--runner DataflowRunner \
--region europe-west2 \
--temp_location gs://joseph-antony-aruja-generic-dev-temp/dataflow-temp
GCS Output
gs://{PROJECT_ID}-generic-{ENV}-segments/
segments/
{run_id}/
ACTIVE_APPROVED/
segment-00-of-01.txt
DECLINED/
segment-00-of-01.txt
REFERRED/
segment-00-of-01.txt
PENDING/
segment-00-of-01.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gcp_pipeline_ref_segment_transform-1.0.11.tar.gz.
File metadata
- Download URL: gcp_pipeline_ref_segment_transform-1.0.11.tar.gz
- Upload date:
- Size: 7.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
356f443f28393d9dc7d29ea5ab8e23755697b39bed768efe849fe11d0ffcfbdc
|
|
| MD5 |
13a6414a26670ea0400a98081b76dc54
|
|
| BLAKE2b-256 |
5129fc5fa97354df79e099d2e8510cee1d6ef53de7e086b17e4110deb3a461ec
|
File details
Details for the file gcp_pipeline_ref_segment_transform-1.0.11-py3-none-any.whl.
File metadata
- Download URL: gcp_pipeline_ref_segment_transform-1.0.11-py3-none-any.whl
- Upload date:
- Size: 7.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2e1099d4618cf7349aad421c8e00bca197f2dc800bbe18f197704de26d3eb2c8
|
|
| MD5 |
0d03b0fdbf884ba40b4cf9fe7881f0e4
|
|
| BLAKE2b-256 |
ca0bae84ca095a1ac53b1716932611056b578fe39cd4751f8246eae9ac5cade3
|