Extract mainframe EBCDIC binary files using real COBOL copybooks. Zero MIPS.
Project description
⚡ Status: Active development · Phase 1 in progress · Star the repo to follow along
Read any mainframe EBCDIC file on your laptop. Zero MIPS spent.
30 Seconds
pip install ztract
ztract convert \
--copybook CUSTMAST.cpy \
--input CUST.MASTER.DAT \
--recfm FB --lrecl 500 \
--codepage cp277 \
--output customers.csv
⠿ extract-prod ████████████████████ 2,400,000 rec ✓
15,234 rec/s · elapsed 2m 37s · 0 rejects
Done. 2,400,000 records → customers.csv
2.4 million Norwegian customer records, correctly decoded (æ ø å Æ Ø Å), in under 3 minutes. On a laptop. Zero mainframe CPU.
What is Ztract?
Ztract is a Python CLI tool that extracts, transforms, and compares mainframe EBCDIC binary files using real COBOL copybooks — no Spark, no cluster, no proprietary tooling.
All the hard parsing (COMP-3 packed decimal, REDEFINES, OCCURS DEPENDING ON, RDW/BDW headers) is handled by Cobrix — a battle-tested, open-source COBOL parser — running as a subprocess. Python handles connectivity, output, orchestration, and observability.
The result: pull files from your mainframe via FTP, SFTP, or Zowe, decode them with your existing .cpy copybooks, and write to CSV, Parquet, a database, or back to the mainframe. In one command.
Features
- Real COBOL copybooks — use your
.cpyfiles as-is, no conversion step, no JSON schema - All IBM record formats — F, FB, V, VB, FBA, VBA (including BDW/RDW and ASA carriage control)
- Norwegian & Scandinavian first — cp277 primary, full æ ø å Æ Ø Å support out of the box
- Bidirectional — read from mainframe and write back; mainframe-to-mainframe flows via Ztract
- Streaming — never loads a full file into memory; millions of records handled on any machine
- Field-level EBCDIC diff — compare two EBCDIC files field-by-field using your copybook as schema
- Mock data generator — generate realistic synthetic EBCDIC test data from any copybook
- YAML pipelines — define multi-step extract/transform/load workflows in a single file
- Copybook inspector — visualise any
.cpyfile as a formatted field table in seconds - Enterprise observability — structured JSON logs, immutable audit trail, reject files with full context
Why not other tools?
| Ztract | Python EBCDIC libs | Cobrix (Spark) | Proprietary tools | |
|---|---|---|---|---|
| Real COBOL copybooks | ✅ | ❌ custom schema | ✅ | ✅ |
| REDEFINES / OCCURS | ✅ Cobrix | ⚠️ partial | ✅ | ✅ |
| cp277 Norwegian | ✅ | ⚠️ varies | ✅ | ✅ |
| No Spark required | ✅ | ✅ | ❌ | ✅ |
| pip install | ✅ | ✅ | ❌ | ❌ |
| EBCDIC diff | ✅ | ❌ | ❌ | ❌ |
| Mock generator | ✅ | ❌ | ❌ | ❌ |
| FTP/SFTP/Zowe built-in | ✅ | ❌ | ❌ | varies |
| Write back to mainframe | ✅ | ❌ | ❌ | varies |
| Open source | ✅ | ✅ | ✅ | ❌ |
| Cost | Free | Free | Free | $$$$ |
Installation
pip install ztract
Requirements:
- Python 3.10+
- Java JRE 11+ on PATH (
java -versionto check — download from Adoptium if needed)
That's it. Everything else — Cobrix engine, diff tools, progress bars — is bundled.
Optional database drivers:
pip install ztract[postgres] # PostgreSQL (psycopg2)
pip install ztract[mysql] # MySQL (PyMySQL)
pip install ztract[mssql] # SQL Server (pyodbc)
pip install ztract[all-db] # All three
Quick Start
Inspect a copybook
ztract inspect --copybook CUSTMAST.cpy
┌─────────────────┬───────┬───────────────┬────────┬────────────┐
│ Field │ Level │ PIC │ Offset │ Size │
├─────────────────┼───────┼───────────────┼────────┼────────────┤
│ CUST-ID │ 05 │ 9(10) │ 0 │ 10 │
│ CUST-NAME │ 05 │ X(50) │ 10 │ 50 │
│ CUST-ADDR │ 05 │ X(80) │ 60 │ 80 │
│ CUST-CITY │ 05 │ X(30) │ 140 │ 30 │
│ CUST-AMT │ 05 │ S9(9)V99 │ 170 │ 6 (COMP-3) │
│ CUST-DATE │ 05 │ 9(8) │ 176 │ 8 │
└─────────────────┴───────┴───────────────┴────────┴────────────┘
Total record length: 500 bytes
Validate before extracting
ztract validate \
--copybook CUSTMAST.cpy \
--input CUST.MASTER.DAT \
--recfm FB --lrecl 500 \
--codepage cp277 \
--sample 1000
Validation complete (1,000 sample records)
✓ Decoded: 998
⚠ Warnings: 2 (invalid sign nibble — see rejects)
✗ Errors: 0
CUST-AMT min: 0.00 max: 9,999,999.99 null: 0.1%
CUST-NAME sample: Bjørn Hansen, Åse Eriksen, Ole Nordmann
Convert to CSV
ztract convert \
--copybook CUSTMAST.cpy \
--input CUST.MASTER.DAT \
--recfm FB --lrecl 500 \
--codepage cp277 \
--output customers.csv
Convert via FTP (pull direct from z/OS)
ztract convert \
--copybook CUSTMAST.cpy \
--input ftp://mf01.bank.com/BEL.CUST.MASTER \
--recfm FB --lrecl 500 \
--codepage cp277 \
--output customers.parquet
Multiple outputs in one pass
ztract convert \
--copybook CUSTMAST.cpy \
--input CUST.MASTER.DAT \
--recfm FB --lrecl 500 \
--codepage cp277 \
--output customers.csv \
--output customers.parquet \
--output postgresql://user:pass@localhost/dwh?table=customer_master
All three targets written concurrently from a single read pass.
Diff two EBCDIC files field-by-field
ztract diff \
--copybook CUSTMAST.cpy \
--before CUST_JAN.DAT \
--after CUST_FEB.DAT \
--key CUST-ID \
--codepage cp277 \
--recfm FB --lrecl 500
ADDED [CUST-ID=000456] CUST-NAME=Bjørn Hansen
DELETED [CUST-ID=000123] CUST-NAME=Ole Nordmann
CHANGED [CUST-ID=000789]
CUST-ADDR: "Oslo Gate 1" → "Bergen Gate 5"
CUST-AMT: 12,345.67 → 12,500.00
Diff complete: 1 added · 1 deleted · 47 changed · 999,951 unchanged
of 1,000,000 total records · 43 seconds
Generate synthetic EBCDIC test data
ztract generate \
--copybook CUSTMAST.cpy \
--records 100000 \
--codepage cp277 \
--recfm FB --lrecl 500 \
--seed 42 \
--output CUST_MOCK.DAT
Generate with boundary value edge cases
ztract generate \
--copybook COMPLEX_NUMERIC.cpy \
--records 1000 \
--edge-cases \
--seed 42 \
--recfm FB --lrecl 300 \
--output NUMERIC_TEST.DAT
With --edge-cases, every 100th record cycles through boundary values: all zeros, all max values, all negatives. Catches encoding bugs that normal random data misses.
Norwegian field names automatically detected (NAVN, ADRESSE, TELEFON, BY) — generates realistic Scandinavian test data with valid packed decimal, correct EBCDIC encoding, and reproducible output.
CLI Commands
| Command | Description |
|---|---|
ztract convert |
Extract EBCDIC → CSV / JSON Lines / Parquet / DB |
ztract diff |
Field-level comparison of two EBCDIC files |
ztract generate |
Generate synthetic EBCDIC test data from a copybook |
ztract run |
Execute a multi-step YAML pipeline |
ztract inspect |
Display copybook layout as a formatted field table |
ztract validate |
Pre-flight check: decode N sample records, report stats |
ztract status |
Show recent job history from audit log |
ztract init |
Scaffold a new Ztract project directory |
YAML Pipelines
Define multi-step workflows in a single file:
# monthly-reconciliation.yaml
version: "1.0"
job:
name: customer-monthly-reconciliation
connections:
prod: &prod
type: ftp
host: mf01.bank.com
user: ${PROD_USER}
password: ${PROD_PASS}
transfer_mode: binary
steps:
- name: extract-prod
action: convert
input:
connection: *prod
dataset: BEL.CUST.MASTER
record_format: FB
lrecl: 500
codepage: cp277
copybook: ./copybooks/CUSTMAST.cpy
output:
- type: csv
path: ./output/prod_customers.csv
expose_as: prod_data
- name: diff-vs-last-month
action: diff
input:
before: ./archive/CUST_LAST.DAT
after: $ref:prod_data.csv
copybook: ./copybooks/CUSTMAST.cpy
diff:
key_fields: [CUST-ID]
output:
- type: console
- type: csv
path: ./output/monthly_changes.csv
- name: push-report-to-mainframe
action: upload
input:
path: ./output/monthly_changes.csv
output:
connection: *prod
dataset: BEL.CUST.CHANGERPT
site_commands:
recfm: FB
lrecl: 500
blksize: 27920
space_unit: CYLINDERS
primary: 5
secondary: 2
ztract run monthly-reconciliation.yaml
ztract run monthly-reconciliation.yaml --dry-run
ztract run monthly-reconciliation.yaml --step extract-prod
Record Formats
| Format | Description |
|---|---|
F |
Fixed length — record size from copybook |
FB |
Fixed Blocked — records in fixed-size blocks |
V |
Variable length with 4-byte RDW headers |
VB |
Variable Blocked — BDW + RDW headers |
FBA |
Fixed Blocked + ASA carriage control (first byte stripped) |
VBA |
Variable Blocked + ASA carriage control (first byte stripped) |
EBCDIC Code Pages
| Code page | Aliases | Region |
|---|---|---|
cp277 ⭐ |
norway, norwegian, danish, nordic |
Denmark / Norway — primary |
cp037 |
us, usa, canada, default |
USA / Canada — default |
cp273 |
germany, german, austria |
Germany / Austria |
cp875 |
greek, greece |
Greece |
cp870 |
eastern_europe, poland, czech |
Eastern Europe |
cp1047 |
latin1, open_systems |
Latin-1 / USS |
cp838 |
thailand, thai |
Thailand |
cp1025 |
cyrillic, russian |
Russia / CIS |
Use the alias anywhere a codepage is expected: --codepage norway or --codepage cp277.
Output Targets
| Format | Description |
|---|---|
.csv |
Comma or pipe delimited, UTF-8, Excel-compatible BOM option |
.jsonl |
JSON Lines, one object per record, ensure_ascii=False |
.parquet |
Apache Parquet via pyarrow, schema auto-derived from copybook |
postgresql://... |
PostgreSQL via psycopg2 (optional: pip install ztract[postgres]) |
mysql://... |
MySQL via PyMySQL (optional: pip install ztract[mysql]) |
mssql://... |
SQL Server via pyodbc (optional: pip install ztract[mssql]) |
ftp://... |
Write back to z/OS via FTP with SITE commands for dataset allocation |
sftp://... |
Write back to z/OS via SFTP |
Connectivity
| Type | Example |
|---|---|
| Local file | --input ./CUST.DAT |
| FTP | --input ftp://user:pass@mf01.bank.com/BEL.CUST.DATA |
| SFTP | --input sftp://user@mf01.bank.com/BEL.CUST.DATA |
| Zowe (z/OSMF) | --zowe-profile MYPROD --dataset BEL.CUST.DATA |
| Zowe (zftp) | --zowe-profile MYPROD --zowe-backend zftp --dataset BEL.CUST.DATA |
Zowe transfer modes: binary (default), text, encoding, record (zftp only, preserves VB RDW headers).
SFTP z/OS paths: MVS dataset names are auto-formatted (BEL.CUST.DATA -> //'BEL.CUST.DATA'). USS paths pass through unchanged.
Credentials support ${ENV_VAR} interpolation in YAML. Passwords never hardcoded.
Observability
Every job produces:
Operational log (./logs/ztract_YYYY-MM-DD.log) — structured JSON, rotated daily, 30-day retention, suitable for ELK/Splunk ingestion.
Audit trail (./audit/ztract_audit.log) — immutable append-only JSON Lines, one entry per job execution. Records user, machine, source dataset, record counts, status. Never rotated, never deleted. Compliance-ready.
Reject file (./rejects/<job>_<step>_<timestamp>_rejects.jsonl) — every failed record preserved with original EBCDIC hex bytes, decoded fields (if available), error type, error message, and byte offset. Re-processable.
Acknowledgements
Ztract's COBOL parsing engine is built on Cobrix by AbsaOSS — an outstanding open-source COBOL/EBCDIC parser for Apache Spark. We use the standalone cobol-parser module (no Spark dependency) and are grateful for the years of work that went into handling REDEFINES, OCCURS DEPENDING ON, packed decimal, and every IBM record format correctly. Standing on their shoulders.
Table diff powered by daff (Apache 2.0).
Binary diff powered by multidiff (MIT).
Console output powered by rich (MIT).
See NOTICE for full attribution.
Contributing
Copybook contributions especially welcome — if you have anonymised/synthetic copybooks for common mainframe layouts (banking, insurance, retail), consider adding them to copybooks/. They make Ztract more useful for everyone and are tested automatically via ztract generate.
See CONTRIBUTING.md to get started.
License
Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ztract-0.1.0.dev1.tar.gz.
File metadata
- Download URL: ztract-0.1.0.dev1.tar.gz
- Upload date:
- Size: 7.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1440e9ecc9c83453b154f07f87ae251cc408489e9a46303019aa2dc9cd60e8f9
|
|
| MD5 |
32b048e5bb192d1c5ddc2a357ed78593
|
|
| BLAKE2b-256 |
2165af013bd124b4b55e3ff56a8e4691785ea4a55e414fa80d4d3a6ada656923
|
File details
Details for the file ztract-0.1.0.dev1-py3-none-any.whl.
File metadata
- Download URL: ztract-0.1.0.dev1-py3-none-any.whl
- Upload date:
- Size: 7.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
366fd72307aa4a08abf6a3fabf1149384bf373f8dbf9be59bac4da724eb9558d
|
|
| MD5 |
431a0ad33648d4b24697656fe4dab96f
|
|
| BLAKE2b-256 |
c36b67dd31a5236bbdcc0fe36b8c356f0472bb9c86aa3c377129d910dbc366e7
|