Skip to main content

Extract mainframe EBCDIC binary files using real COBOL copybooks. Zero MIPS.

Project description

Ztract

Python 3.10+ License

JRE

Status: Active development · Phase 1 in progress · Star the repo to follow along


Read any mainframe EBCDIC file on your laptop. Zero MIPS spent.


30 Seconds

pip install ztract

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv
⠿ extract-prod  ████████████████████  2,400,000 rec  ✓
  15,234 rec/s · elapsed 2m 37s · 0 rejects

Done. 2,400,000 records → customers.csv

2.4 million Norwegian customer records, correctly decoded (æ ø å Æ Ø Å), in under 3 minutes. On a laptop. Zero mainframe CPU.


What is Ztract?

Ztract is a Python CLI tool that extracts, transforms, and compares mainframe EBCDIC binary files using real COBOL copybooks — no Spark, no cluster, no proprietary tooling.

All the hard parsing (COMP-3 packed decimal, REDEFINES, OCCURS DEPENDING ON, RDW/BDW headers) is handled by Cobrix — a battle-tested, open-source COBOL parser — running as a subprocess. Python handles connectivity, output, orchestration, and observability.

The result: pull files from your mainframe via FTP, SFTP, or Zowe, decode them with your existing .cpy copybooks, and write to CSV, Parquet, a database, or back to the mainframe. In one command.


Features

  • Real COBOL copybooks — use your .cpy files as-is, no conversion step, no JSON schema
  • All IBM record formats — F, FB, V, VB, FBA, VBA (including BDW/RDW and ASA carriage control)
  • Norwegian & Scandinavian first — cp277 primary, full æ ø å Æ Ø Å support out of the box
  • Bidirectional — read from mainframe and write back; mainframe-to-mainframe flows via Ztract
  • Streaming — never loads a full file into memory; millions of records handled on any machine
  • Field-level EBCDIC diff — compare two EBCDIC files field-by-field using your copybook as schema
  • Mock data generator — generate realistic synthetic EBCDIC test data from any copybook
  • YAML pipelines — define multi-step extract/transform/load workflows in a single file
  • Copybook inspector — visualise any .cpy file as a formatted field table in seconds
  • Enterprise observability — structured JSON logs, immutable audit trail, reject files with full context

Why not other tools?

Ztract Python EBCDIC libs Cobrix (Spark) Proprietary tools
Real COBOL copybooks ❌ custom schema
REDEFINES / OCCURS ✅ Cobrix ⚠️ partial
cp277 Norwegian ⚠️ varies
No Spark required
pip install
EBCDIC diff
Mock generator
FTP/SFTP/Zowe built-in varies
Write back to mainframe varies
Open source
Cost Free Free Free $$$$

Installation

pip install ztract

Requirements:

  • Python 3.10+
  • Java JRE 11+ on PATH (java -version to check — download from Adoptium if needed)

That's it. Everything else — Cobrix engine, diff tools, progress bars — is bundled.

Optional database drivers:

pip install ztract[postgres]   # PostgreSQL (psycopg2)
pip install ztract[mysql]      # MySQL (PyMySQL)
pip install ztract[mssql]      # SQL Server (pyodbc)
pip install ztract[all-db]     # All three

Quick Start

Inspect a copybook

ztract inspect --copybook CUSTMAST.cpy
┌─────────────────┬───────┬───────────────┬────────┬────────────┐
│ Field           │ Level │ PIC           │ Offset │ Size       │
├─────────────────┼───────┼───────────────┼────────┼────────────┤
│ CUST-ID         │ 05    │ 9(10)         │ 0      │ 10         │
│ CUST-NAME       │ 05    │ X(50)         │ 10     │ 50         │
│ CUST-ADDR       │ 05    │ X(80)         │ 60     │ 80         │
│ CUST-CITY       │ 05    │ X(30)         │ 140    │ 30         │
│ CUST-AMT        │ 05    │ S9(9)V99      │ 170    │ 6 (COMP-3) │
│ CUST-DATE       │ 05    │ 9(8)          │ 176    │ 8          │
└─────────────────┴───────┴───────────────┴────────┴────────────┘
Total record length: 500 bytes

Validate before extracting

ztract validate \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --sample   1000
Validation complete (1,000 sample records)
  ✓ Decoded:   998
  ⚠ Warnings:    2  (invalid sign nibble — see rejects)
  ✗ Errors:      0
  CUST-AMT  min: 0.00   max: 9,999,999.99   null: 0.1%
  CUST-NAME sample: Bjørn Hansen, Åse Eriksen, Ole Nordmann

Convert to CSV

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv

Convert via FTP (pull direct from z/OS)

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    ftp://mf01.bank.com/BEL.CUST.MASTER \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.parquet

Multiple outputs in one pass

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv \
  --output   customers.parquet \
  --output   postgresql://user:pass@localhost/dwh?table=customer_master

All three targets written concurrently from a single read pass.

Diff two EBCDIC files field-by-field

ztract diff \
  --copybook CUSTMAST.cpy \
  --before   CUST_JAN.DAT \
  --after    CUST_FEB.DAT \
  --key      CUST-ID \
  --codepage cp277 \
  --recfm    FB  --lrecl 500
ADDED    [CUST-ID=000456]  CUST-NAME=Bjørn Hansen
DELETED  [CUST-ID=000123]  CUST-NAME=Ole Nordmann
CHANGED  [CUST-ID=000789]
  CUST-ADDR:  "Oslo Gate 1" → "Bergen Gate 5"
  CUST-AMT:   12,345.67 → 12,500.00

Diff complete: 1 added · 1 deleted · 47 changed · 999,951 unchanged
of 1,000,000 total records · 43 seconds

Generate synthetic EBCDIC test data

ztract generate \
  --copybook CUSTMAST.cpy \
  --records  100000 \
  --codepage cp277 \
  --recfm    FB  --lrecl 500 \
  --seed     42 \
  --output   CUST_MOCK.DAT

Generate with boundary value edge cases

ztract generate \
  --copybook COMPLEX_NUMERIC.cpy \
  --records  1000 \
  --edge-cases \
  --seed     42 \
  --recfm    FB  --lrecl 300 \
  --output   NUMERIC_TEST.DAT

With --edge-cases, every 100th record cycles through boundary values: all zeros, all max values, all negatives. Catches encoding bugs that normal random data misses.

Norwegian field names automatically detected (NAVN, ADRESSE, TELEFON, BY) — generates realistic Scandinavian test data with valid packed decimal, correct EBCDIC encoding, and reproducible output.


CLI Commands

Command Description
ztract convert Extract EBCDIC → CSV / JSON Lines / Parquet / DB
ztract diff Field-level comparison of two EBCDIC files
ztract generate Generate synthetic EBCDIC test data from a copybook
ztract run Execute a multi-step YAML pipeline
ztract inspect Display copybook layout as a formatted field table
ztract validate Pre-flight check: decode N sample records, report stats
ztract status Show recent job history from audit log
ztract init Scaffold a new Ztract project directory

YAML Pipelines

Define multi-step workflows in a single file:

# monthly-reconciliation.yaml
version: "1.0"
job:
  name: customer-monthly-reconciliation

connections:
  prod: &prod
    type: ftp
    host: mf01.bank.com
    user: ${PROD_USER}
    password: ${PROD_PASS}
    transfer_mode: binary

steps:
  - name: extract-prod
    action: convert
    input:
      connection: *prod
      dataset: BEL.CUST.MASTER
      record_format: FB
      lrecl: 500
      codepage: cp277
    copybook: ./copybooks/CUSTMAST.cpy
    output:
      - type: csv
        path: ./output/prod_customers.csv
    expose_as: prod_data

  - name: diff-vs-last-month
    action: diff
    input:
      before: ./archive/CUST_LAST.DAT
      after:  $ref:prod_data.csv
    copybook: ./copybooks/CUSTMAST.cpy
    diff:
      key_fields: [CUST-ID]
    output:
      - type: console
      - type: csv
        path: ./output/monthly_changes.csv

  - name: push-report-to-mainframe
    action: upload
    input:
      path: ./output/monthly_changes.csv
    output:
      connection: *prod
      dataset: BEL.CUST.CHANGERPT
      site_commands:
        recfm: FB
        lrecl: 500
        blksize: 27920
        space_unit: CYLINDERS
        primary: 5
        secondary: 2
ztract run monthly-reconciliation.yaml
ztract run monthly-reconciliation.yaml --dry-run
ztract run monthly-reconciliation.yaml --step extract-prod

Record Formats

Format Description
F Fixed length — record size from copybook
FB Fixed Blocked — records in fixed-size blocks
V Variable length with 4-byte RDW headers
VB Variable Blocked — BDW + RDW headers
FBA Fixed Blocked + ASA carriage control (first byte stripped)
VBA Variable Blocked + ASA carriage control (first byte stripped)

EBCDIC Code Pages

Code page Aliases Region
cp277 norway, norwegian, danish, nordic Denmark / Norway — primary
cp037 us, usa, canada, default USA / Canada — default
cp273 germany, german, austria Germany / Austria
cp875 greek, greece Greece
cp870 eastern_europe, poland, czech Eastern Europe
cp1047 latin1, open_systems Latin-1 / USS
cp838 thailand, thai Thailand
cp1025 cyrillic, russian Russia / CIS

Use the alias anywhere a codepage is expected: --codepage norway or --codepage cp277.


Output Targets

Format Description
.csv Comma or pipe delimited, UTF-8, Excel-compatible BOM option
.jsonl JSON Lines, one object per record, ensure_ascii=False
.parquet Apache Parquet via pyarrow, schema auto-derived from copybook
postgresql://... PostgreSQL via psycopg2 (optional: pip install ztract[postgres])
mysql://... MySQL via PyMySQL (optional: pip install ztract[mysql])
mssql://... SQL Server via pyodbc (optional: pip install ztract[mssql])
ftp://... Write back to z/OS via FTP with SITE commands for dataset allocation
sftp://... Write back to z/OS via SFTP

Connectivity

Type Example
Local file --input ./CUST.DAT
FTP --input ftp://user:pass@mf01.bank.com/BEL.CUST.DATA
SFTP --input sftp://user@mf01.bank.com/BEL.CUST.DATA
Zowe (z/OSMF) --zowe-profile MYPROD --dataset BEL.CUST.DATA
Zowe (zftp) --zowe-profile MYPROD --zowe-backend zftp --dataset BEL.CUST.DATA

Zowe transfer modes: binary (default), text, encoding, record (zftp only, preserves VB RDW headers).

SFTP z/OS paths: MVS dataset names are auto-formatted (BEL.CUST.DATA -> //'BEL.CUST.DATA'). USS paths pass through unchanged.

Credentials support ${ENV_VAR} interpolation in YAML. Passwords never hardcoded.


Observability

Every job produces:

Operational log (./logs/ztract_YYYY-MM-DD.log) — structured JSON, rotated daily, 30-day retention, suitable for ELK/Splunk ingestion.

Audit trail (./audit/ztract_audit.log) — immutable append-only JSON Lines, one entry per job execution. Records user, machine, source dataset, record counts, status. Never rotated, never deleted. Compliance-ready.

Reject file (./rejects/<job>_<step>_<timestamp>_rejects.jsonl) — every failed record preserved with original EBCDIC hex bytes, decoded fields (if available), error type, error message, and byte offset. Re-processable.


Acknowledgements

Ztract's COBOL parsing engine is built on Cobrix by AbsaOSS — an outstanding open-source COBOL/EBCDIC parser for Apache Spark. We use the standalone cobol-parser module (no Spark dependency) and are grateful for the years of work that went into handling REDEFINES, OCCURS DEPENDING ON, packed decimal, and every IBM record format correctly. Standing on their shoulders.

Table diff powered by daff (Apache 2.0).
Binary diff powered by multidiff (MIT).
Console output powered by rich (MIT).

See NOTICE for full attribution.


Contributing

Copybook contributions especially welcome — if you have anonymised/synthetic copybooks for common mainframe layouts (banking, insurance, retail), consider adding them to copybooks/. They make Ztract more useful for everyone and are tested automatically via ztract generate.

See CONTRIBUTING.md to get started.


License

Apache License 2.0 — see LICENSE.


Built with ❤️ for the mainframe community · github.com/SRRC-1334/ztract

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ztract-0.1.0.dev1.tar.gz (7.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ztract-0.1.0.dev1-py3-none-any.whl (7.1 MB view details)

Uploaded Python 3

File details

Details for the file ztract-0.1.0.dev1.tar.gz.

File metadata

  • Download URL: ztract-0.1.0.dev1.tar.gz
  • Upload date:
  • Size: 7.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ztract-0.1.0.dev1.tar.gz
Algorithm Hash digest
SHA256 1440e9ecc9c83453b154f07f87ae251cc408489e9a46303019aa2dc9cd60e8f9
MD5 32b048e5bb192d1c5ddc2a357ed78593
BLAKE2b-256 2165af013bd124b4b55e3ff56a8e4691785ea4a55e414fa80d4d3a6ada656923

See more details on using hashes here.

File details

Details for the file ztract-0.1.0.dev1-py3-none-any.whl.

File metadata

  • Download URL: ztract-0.1.0.dev1-py3-none-any.whl
  • Upload date:
  • Size: 7.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ztract-0.1.0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 366fd72307aa4a08abf6a3fabf1149384bf373f8dbf9be59bac4da724eb9558d
MD5 431a0ad33648d4b24697656fe4dab96f
BLAKE2b-256 c36b67dd31a5236bbdcc0fe36b8c356f0472bb9c86aa3c377129d910dbc366e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page