ztract

Extract mainframe EBCDIC binary files using real COBOL copybooks. Zero MIPS.

These details have not been verified by PyPI

Project links

Project description

⚡ Status: Active development · Phase 1 in progress · Star the repo to follow along

Read any mainframe EBCDIC file on your laptop. Zero MIPS spent.

30 Seconds

pip install ztract

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv

⠿ extract-prod  ████████████████████  2,400,000 rec  ✓
  15,234 rec/s · elapsed 2m 37s · 0 rejects

Done. 2,400,000 records → customers.csv

2.4 million Norwegian customer records, correctly decoded (æ ø å Æ Ø Å), in under 3 minutes. On a laptop. Zero mainframe CPU.

What is Ztract?

Ztract is a Python CLI tool that extracts, transforms, and compares mainframe EBCDIC binary files using real COBOL copybooks — no Spark, no cluster, no proprietary tooling.

All the hard parsing (COMP-3 packed decimal, REDEFINES, OCCURS DEPENDING ON, RDW/BDW headers) is handled by Cobrix — a battle-tested, open-source COBOL parser — running as a subprocess. Python handles connectivity, output, orchestration, and observability.

The result: pull files from your mainframe via FTP, SFTP, or Zowe, decode them with your existing .cpy copybooks, and write to CSV, Parquet, a database, or back to the mainframe. In one command.

Features

Real COBOL copybooks — use your .cpy files as-is, no conversion step, no JSON schema
All IBM record formats — F, FB, V, VB, FBA, VBA (including BDW/RDW and ASA carriage control)
Norwegian & Scandinavian first — cp277 primary, full æ ø å Æ Ø Å support out of the box
Bidirectional — read from mainframe and write back; mainframe-to-mainframe flows via Ztract
Streaming — never loads a full file into memory; millions of records handled on any machine
Field-level EBCDIC diff — compare two EBCDIC files field-by-field using your copybook as schema
Mock data generator — generate realistic synthetic EBCDIC test data from any copybook
YAML pipelines — define multi-step extract/transform/load workflows in a single file
Copybook inspector — visualise any .cpy file as a formatted field table in seconds
Enterprise observability — structured JSON logs, immutable audit trail, reject files with full context

Why not other tools?

	Ztract	Python EBCDIC libs	Cobrix (Spark)	Proprietary tools
Real COBOL copybooks	✅	❌ custom schema	✅	✅
REDEFINES / OCCURS	✅ Cobrix	⚠️ partial	✅	✅
cp277 Norwegian	✅	⚠️ varies	✅	✅
No Spark required	✅	✅	❌	✅
pip install	✅	✅	❌	❌
EBCDIC diff	✅	❌	❌	❌
Mock generator	✅	❌	❌	❌
FTP/SFTP/Zowe built-in	✅	❌	❌	varies
Write back to mainframe	✅	❌	❌	varies
Open source	✅	✅	✅	❌
Cost	Free	Free	Free	$$$$

Installation

pip install ztract

Requirements:

Python 3.10+
Java JRE 11+ on PATH (java -version to check — download from Adoptium if needed)

That's it. Everything else — Cobrix engine, diff tools, progress bars — is bundled.

Optional database drivers:

pip install ztract[postgres]   # PostgreSQL (psycopg2)
pip install ztract[mysql]      # MySQL (PyMySQL)
pip install ztract[mssql]      # SQL Server (pyodbc)
pip install ztract[all-db]     # All three

Quick Start

Inspect a copybook

ztract inspect --copybook CUSTMAST.cpy

┌─────────────────┬───────┬───────────────┬────────┬────────────┐
│ Field           │ Level │ PIC           │ Offset │ Size       │
├─────────────────┼───────┼───────────────┼────────┼────────────┤
│ CUST-ID         │ 05    │ 9(10)         │ 0      │ 10         │
│ CUST-NAME       │ 05    │ X(50)         │ 10     │ 50         │
│ CUST-ADDR       │ 05    │ X(80)         │ 60     │ 80         │
│ CUST-CITY       │ 05    │ X(30)         │ 140    │ 30         │
│ CUST-AMT        │ 05    │ S9(9)V99      │ 170    │ 6 (COMP-3) │
│ CUST-DATE       │ 05    │ 9(8)          │ 176    │ 8          │
└─────────────────┴───────┴───────────────┴────────┴────────────┘
Total record length: 500 bytes

Validate before extracting

ztract validate \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --sample   1000

Validation complete (1,000 sample records)
  ✓ Decoded:   998
  ⚠ Warnings:    2  (invalid sign nibble — see rejects)
  ✗ Errors:      0
  CUST-AMT  min: 0.00   max: 9,999,999.99   null: 0.1%
  CUST-NAME sample: Bjørn Hansen, Åse Eriksen, Ole Nordmann

Convert to CSV

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv

Convert via FTP (pull direct from z/OS)

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    ftp://mf01.bank.com/BEL.CUST.MASTER \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.parquet

Multiple outputs in one pass

ztract convert \
  --copybook CUSTMAST.cpy \
  --input    CUST.MASTER.DAT \
  --recfm    FB  --lrecl 500 \
  --codepage cp277 \
  --output   customers.csv \
  --output   customers.parquet \
  --output   postgresql://user:pass@localhost/dwh?table=customer_master

All three targets written concurrently from a single read pass.

Diff two EBCDIC files field-by-field

ztract diff \
  --copybook CUSTMAST.cpy \
  --before   CUST_JAN.DAT \
  --after    CUST_FEB.DAT \
  --key      CUST-ID \
  --codepage cp277 \
  --recfm    FB  --lrecl 500

ADDED    [CUST-ID=000456]  CUST-NAME=Bjørn Hansen
DELETED  [CUST-ID=000123]  CUST-NAME=Ole Nordmann
CHANGED  [CUST-ID=000789]
  CUST-ADDR:  "Oslo Gate 1" → "Bergen Gate 5"
  CUST-AMT:   12,345.67 → 12,500.00

Diff complete: 1 added · 1 deleted · 47 changed · 999,951 unchanged
of 1,000,000 total records · 43 seconds

Generate synthetic EBCDIC test data

ztract generate \
  --copybook CUSTMAST.cpy \
  --records  100000 \
  --codepage cp277 \
  --recfm    FB  --lrecl 500 \
  --seed     42 \
  --output   CUST_MOCK.DAT

Generate with boundary value edge cases

ztract generate \
  --copybook COMPLEX_NUMERIC.cpy \
  --records  1000 \
  --edge-cases \
  --seed     42 \
  --recfm    FB  --lrecl 300 \
  --output   NUMERIC_TEST.DAT

With --edge-cases, every 100th record cycles through boundary values: all zeros, all max values, all negatives. Catches encoding bugs that normal random data misses.

Norwegian field names automatically detected (NAVN, ADRESSE, TELEFON, BY) — generates realistic Scandinavian test data with valid packed decimal, correct EBCDIC encoding, and reproducible output.

CLI Commands

Command	Description
`ztract convert`	Extract EBCDIC → CSV / JSON Lines / Parquet / DB
`ztract diff`	Field-level comparison of two EBCDIC files
`ztract generate`	Generate synthetic EBCDIC test data from a copybook
`ztract run`	Execute a multi-step YAML pipeline
`ztract inspect`	Display copybook layout as a formatted field table
`ztract validate`	Pre-flight check: decode N sample records, report stats
`ztract status`	Show recent job history from audit log
`ztract init`	Scaffold a new Ztract project directory

YAML Pipelines

Define multi-step workflows in a single file:

# monthly-reconciliation.yaml
version: "1.0"
job:
  name: customer-monthly-reconciliation

connections:
  prod: &prod
    type: ftp
    host: mf01.bank.com
    user: ${PROD_USER}
    password: ${PROD_PASS}
    transfer_mode: binary

steps:
  - name: extract-prod
    action: convert
    input:
      connection: *prod
      dataset: BEL.CUST.MASTER
      record_format: FB
      lrecl: 500
      codepage: cp277
    copybook: ./copybooks/CUSTMAST.cpy
    output:
      - type: csv
        path: ./output/prod_customers.csv
    expose_as: prod_data

  - name: diff-vs-last-month
    action: diff
    input:
      before: ./archive/CUST_LAST.DAT
      after:  $ref:prod_data.csv
    copybook: ./copybooks/CUSTMAST.cpy
    diff:
      key_fields: [CUST-ID]
    output:
      - type: console
      - type: csv
        path: ./output/monthly_changes.csv

  - name: push-report-to-mainframe
    action: upload
    input:
      path: ./output/monthly_changes.csv
    output:
      connection: *prod
      dataset: BEL.CUST.CHANGERPT
      site_commands:
        recfm: FB
        lrecl: 500
        blksize: 27920
        space_unit: CYLINDERS
        primary: 5
        secondary: 2

ztract run monthly-reconciliation.yaml
ztract run monthly-reconciliation.yaml --dry-run
ztract run monthly-reconciliation.yaml --step extract-prod

Record Formats

Format	Description
`F`	Fixed length — record size from copybook
`FB`	Fixed Blocked — records in fixed-size blocks
`V`	Variable length with 4-byte RDW headers
`VB`	Variable Blocked — BDW + RDW headers
`FBA`	Fixed Blocked + ASA carriage control (first byte stripped)
`VBA`	Variable Blocked + ASA carriage control (first byte stripped)

EBCDIC Code Pages

Code page	Aliases	Region
`cp277` ⭐	`norway`, `norwegian`, `danish`, `nordic`	Denmark / Norway — primary
`cp037`	`us`, `usa`, `canada`, `default`	USA / Canada — default
`cp273`	`germany`, `german`, `austria`	Germany / Austria
`cp875`	`greek`, `greece`	Greece
`cp870`	`eastern_europe`, `poland`, `czech`	Eastern Europe
`cp1047`	`latin1`, `open_systems`	Latin-1 / USS
`cp838`	`thailand`, `thai`	Thailand
`cp1025`	`cyrillic`, `russian`	Russia / CIS

Use the alias anywhere a codepage is expected: --codepage norway or --codepage cp277.

Output Targets

Format	Description
`.csv`	Comma or pipe delimited, UTF-8, Excel-compatible BOM option
`.jsonl`	JSON Lines, one object per record, `ensure_ascii=False`
`.parquet`	Apache Parquet via pyarrow, schema auto-derived from copybook
`postgresql://...`	PostgreSQL via psycopg2 (optional: `pip install ztract[postgres]`)
`mysql://...`	MySQL via PyMySQL (optional: `pip install ztract[mysql]`)
`mssql://...`	SQL Server via pyodbc (optional: `pip install ztract[mssql]`)
`ftp://...`	Write back to z/OS via FTP with SITE commands for dataset allocation
`sftp://...`	Write back to z/OS via SFTP

Connectivity

Type	Example
Local file	`--input ./CUST.DAT`
FTP	`--input ftp://user:pass@mf01.bank.com/BEL.CUST.DATA`
SFTP	`--input sftp://user@mf01.bank.com/BEL.CUST.DATA`
Zowe (z/OSMF)	`--zowe-profile MYPROD --dataset BEL.CUST.DATA`
Zowe (zftp)	`--zowe-profile MYPROD --zowe-backend zftp --dataset BEL.CUST.DATA`

Zowe transfer modes: binary (default), text, encoding, record (zftp only, preserves VB RDW headers).

SFTP z/OS paths: MVS dataset names are auto-formatted (BEL.CUST.DATA -> //'BEL.CUST.DATA'). USS paths pass through unchanged.

Credentials support ${ENV_VAR} interpolation in YAML. Passwords never hardcoded.

Observability

Every job produces:

Operational log (./logs/ztract_YYYY-MM-DD.log) — structured JSON, rotated daily, 30-day retention, suitable for ELK/Splunk ingestion.

Audit trail (./audit/ztract_audit.log) — immutable append-only JSON Lines, one entry per job execution. Records user, machine, source dataset, record counts, status. Never rotated, never deleted. Compliance-ready.

Reject file (./rejects/<job>_<step>_<timestamp>_rejects.jsonl) — every failed record preserved with original EBCDIC hex bytes, decoded fields (if available), error type, error message, and byte offset. Re-processable.

Acknowledgements

Ztract's COBOL parsing engine is built on Cobrix by AbsaOSS — an outstanding open-source COBOL/EBCDIC parser for Apache Spark. We use the standalone cobol-parser module (no Spark dependency) and are grateful for the years of work that went into handling REDEFINES, OCCURS DEPENDING ON, packed decimal, and every IBM record format correctly. Standing on their shoulders.

Table diff powered by daff (Apache 2.0).
Binary diff powered by multidiff (MIT).
Console output powered by rich (MIT).

See NOTICE for full attribution.

Contributing

Copybook contributions especially welcome — if you have anonymised/synthetic copybooks for common mainframe layouts (banking, insurance, retail), consider adding them to copybooks/. They make Ztract more useful for everyone and are tested automatically via ztract generate.

See CONTRIBUTING.md to get started.

License

Apache License 2.0 — see LICENSE.

_{Built with ❤️ for the mainframe community · github.com/SRRC-1334/ztract}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0.dev1 pre-release

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ztract-0.1.0.dev1.tar.gz (7.1 MB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ztract-0.1.0.dev1-py3-none-any.whl (7.1 MB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file ztract-0.1.0.dev1.tar.gz.

File metadata

Download URL: ztract-0.1.0.dev1.tar.gz
Upload date: Apr 6, 2026
Size: 7.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ztract-0.1.0.dev1.tar.gz
Algorithm	Hash digest
SHA256	`1440e9ecc9c83453b154f07f87ae251cc408489e9a46303019aa2dc9cd60e8f9`
MD5	`32b048e5bb192d1c5ddc2a357ed78593`
BLAKE2b-256	`2165af013bd124b4b55e3ff56a8e4691785ea4a55e414fa80d4d3a6ada656923`

See more details on using hashes here.

File details

Details for the file ztract-0.1.0.dev1-py3-none-any.whl.

File metadata

Download URL: ztract-0.1.0.dev1-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 7.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for ztract-0.1.0.dev1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`366fd72307aa4a08abf6a3fabf1149384bf373f8dbf9be59bac4da724eb9558d`
MD5	`431a0ad33648d4b24697656fe4dab96f`
BLAKE2b-256	`c36b67dd31a5236bbdcc0fe36b8c356f0472bb9c86aa3c377129d910dbc366e7`

See more details on using hashes here.

ztract 0.1.0.dev1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

30 Seconds

What is Ztract?

Features

Why not other tools?

Installation

Quick Start

Inspect a copybook

Validate before extracting

Convert to CSV

Convert via FTP (pull direct from z/OS)

Multiple outputs in one pass

Diff two EBCDIC files field-by-field

Generate synthetic EBCDIC test data

Generate with boundary value edge cases

CLI Commands

YAML Pipelines

Record Formats

EBCDIC Code Pages

Output Targets

Connectivity

Observability

Acknowledgements

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes