A parser for fixed-format 80-column COBOL copybooks.
Project description
python-copybook
A Python library for parsing fixed-format COBOL copybook files into structured, traversable data models.
Overview
python-copybook reads .cpy and .cbl copybook files and produces a clean hierarchy of parsed COBOL data fields. It handles multi-line statements, continuation lines, inline comments, REDEFINES, OCCURS, RENAMES, and all standard data definition clauses.
The library is structured as a five-stage pipeline:
File → CobolLine → CobolStatement → CobolField → CobolNode tree → rendered output
Each stage is independent and produces typed objects, so you can stop at any layer depending on what you need.
Current Scope
This library currently supports flat, self-contained copybooks only.
- Fixed-format 80-column COBOL records
- No
COPYstatement resolution — copybooks with imports are not yet supported- Offset, length, and position calculation is in progress
- Buffer and memory view generation (mapping a raw data record to named fields) is planned
Installation
pip install python-copybook
Requirements
Python 3.12+
Quick Start
from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree, build_flat_tree
from copybook.render import render_copybook, render_flat_copybook
# Stage 1 — read physical lines
lines = read_copybook("CLAIMREC.cpy")
# Stage 2 — assemble logical statements
statements = read_statements(lines)
# Stage 3 — parse fields from statements
fields = parse_fields(statements)
# Stage 4 — build hierarchy tree
tree = build_tree(fields)
# Stage 5 — render output
print(render_copybook(tree))
Pipeline Stages
Stage 1 — read_copybook(path) → list[CobolLine]
Reads the file and slices each physical line into its fixed-format column regions. Lines are stored raw — no interpretation occurs at this stage.
Columns 1–6 — Sequence number
Column 7 — Indicator (* = comment, - = continuation, space = normal)
Columns 8–72 — Area (actual COBOL content)
Columns 73–80 — Identification (free-form comment)
Stage 2 — read_statements(lines) → list[CobolStatement]
Joins physical lines into complete logical declarations. In fixed-format COBOL, a declaration ends with a period and may span multiple lines. This stage handles:
- Full-line comments (
*or/indicator) - Continuation lines (
-indicator) - Inline comments (
*>syntax, COBOL 2002+)
Stage 3 — parse_fields(statements) → list[CobolField]
Extracts every COBOL clause from each statement using independent regex searches. Clause order in the source does not matter.
Clauses parsed: PIC/PICTURE, REDEFINES, RENAMES, RENAMES THRU, USAGE, OCCURS, DEPENDING ON, INDEXED BY, VALUE/VALUES ARE, SIGN SEPARATE, SYNCHRONIZED
Stage 4 — build_tree(fields) → list[CobolNode]
Assigns parent-child relationships using level numbers. Returns root-level nodes; all others are reachable via .children.
Two tree views are available:
| Function | Contents |
|---|---|
build_tree(fields) |
Full hierarchy — mirrors source exactly |
build_flat_tree(fields) |
Elementary fields only — no groups, no 88s, no REDEFINES |
Stage 5 — Rendering
| Function | Output |
|---|---|
render_copybook(tree) |
Full copybook, original level numbers preserved |
render_flat_copybook(nodes, record_name) |
All fields remapped to level 05 under one 01 record |
render_copybook accepts an indent_size parameter (default 4) to control visual depth:
render_copybook(tree, indent_size=0) # flat — all fields at Area B
render_copybook(tree, indent_size=4) # default — 4 spaces per depth level
Data Models
CobolLine
One physical line from the file. Never mutated after creation.
line.line_number # int — 1-based
line.indicator # str — single char, space = normal
line.area # str — raw columns 8–72
line.is_comment # bool — indicator is * or /
line.is_continuation # bool — indicator is -
line.raw_line # str — reconstructed 80-char record
CobolStatement
One complete logical declaration, potentially spanning multiple lines.
stmt.area # str — full joined text e.g. '15 Field PIC X(10).'
stmt.source_lines # list[CobolLine] — contributing physical lines
stmt.reserved_words # list[str] — COBOL keywords found in this statement
stmt.copybook # str — source file path
CobolField
One parsed data field with all clauses extracted.
field.level # int — level number
field.name # str — uppercased field name
field.picture # str | None — e.g. '9(07)', 'X(15)', 'S9(7)V99'
field.redefines # str | None — target field name
field.usage # str | None — e.g. 'COMP-3', 'BINARY', 'DISPLAY'
field.occurs # int | None — fixed array size
field.occurs_max # int | None — upper bound for variable arrays
field.depending_on # str | None — field holding current array size
field.indexed_by # str | None — index name for OCCURS tables
field.value # str | None — VALUE clause content
field.sign_separate # bool — SIGN LEADING/TRAILING SEPARATE present
field.synchronized # bool — SYNC/SYNCHRONIZED present
# Derived properties
field.is_filler # name == 'FILLER'
field.is_condition # level == 88
field.is_group # no PIC clause
field.is_redefine # has REDEFINES clause
field.is_array # has OCCURS clause
field.is_variable_array # has DEPENDING ON clause
field.is_numeric # PIC contains 9
field.is_signed # PIC starts with S
CobolNode
A node in the field hierarchy tree. Wraps a CobolField with parent/child references.
node.cobol_field # CobolField
node.parent # CobolNode | None
node.children # list[CobolNode]
node.level # shortcut to cobol_field.level
node.name # shortcut to cobol_field.name
node.to_dict() # JSON string — parent serialized as name string to avoid circular refs
Examples
Parse and print a copybook tree
from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree
from copybook.render import render_copybook
lines = read_copybook("CLAIMREC.cpy")
statements = read_statements(lines)
fields = parse_fields(statements)
tree = build_tree(fields)
print(render_copybook(tree, indent_size=4))
01 Claim-Record.
12 Insured-Details.
15 Insured-Policy-No PIC 9(07).
15 Insured-Last-Name PIC X(15).
12 Policy-Details.
15 Policy-Type PIC 9.
88 Private VALUE 1.
88 Medicare VALUE 2.
Generate a flat copybook (no groups, no REDEFINES)
from copybook.tree import build_flat_tree
from copybook.render import render_flat_copybook
flat = build_flat_tree(fields)
print(render_flat_copybook(flat, "FLAT-CLAIMREC"))
01 FLAT-CLAIMREC.
05 Insured-Policy-No PIC 9(07).
05 Insured-Last-Name PIC X(15).
05 Insured-First-Name PIC X(10).
05 Policy-Type PIC 9.
05 Policy-Benefit-Date-Num PIC 9(08).
05 Policy-Amount PIC S9(7)V99.
Inspect a field
for field in fields:
if field.is_numeric:
print(f"{field.name}: PIC {field.picture}")
Serialize a tree node to JSON
tree = build_tree(fields)
print(tree[0].to_dict())
Project Structure
copybook/
models.py — CobolLine, CobolStatement, CobolField, CobolNode
patterns.py — regex patterns, layout constants, reserved words
parser.py — read_copybook, read_statements, parse_fields
tree.py — build_tree, build_flat_tree
render.py — render_copybook, render_flat_copybook
reserved_words.txt — COBOL reserved word list
Supported COBOL Features
| Feature | Supported |
|---|---|
| Fixed-format 80-column records | ✅ |
Full-line comments (*, / indicator) |
✅ |
Continuation lines (- indicator) |
✅ |
Inline comments (*>) |
✅ |
| Multi-line statements | ✅ |
| PIC / PICTURE clause | ✅ |
| REDEFINES | ✅ |
| OCCURS n TIMES | ✅ |
| OCCURS n TO m TIMES DEPENDING ON | ✅ |
| INDEXED BY | ✅ |
| VALUE / VALUES ARE | ✅ |
| USAGE / COMP / COMP-3 / BINARY etc. | ✅ |
| SIGN LEADING/TRAILING SEPARATE | ✅ |
| SYNCHRONIZED / SYNC | ✅ |
| RENAMES / RENAMES THRU | ✅ |
| Level 66, 77, 88 | ✅ |
:tag: style replacement tokens |
✅ |
| Free-format COBOL | ❌ |
Roadmap
- PIC clause byte length calculator (
DISPLAY,COMP-3,BINARY, etc.) - Field offset and position computation
-
REDEFINES-aware offset handling (overlapping storage) -
OCCURSmultiplier in offset calculation - Raw record buffer slicing by field name
- Multiple memory view strategies (full tree, flat, storage-only)
-
COPYstatement resolution across multiple copybook files
License
GPL-3.0 — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file python_copybook-0.1.2.tar.gz.
File metadata
- Download URL: python_copybook-0.1.2.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.10 Darwin/25.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
286fed7da99a55a97102b2fe4347dfb7bd67c6a2b30c7b6cb41318ca05353f99
|
|
| MD5 |
19845ded4ff99d937822285ddfa7d3bb
|
|
| BLAKE2b-256 |
20a0359eb2f75103fdb938bf7fb371d3322d0b5172217e5002e903f966ee5450
|
File details
Details for the file python_copybook-0.1.2-py3-none-any.whl.
File metadata
- Download URL: python_copybook-0.1.2-py3-none-any.whl
- Upload date:
- Size: 38.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.12.10 Darwin/25.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa11cb2f409cd28620c8e9e6a310828d36668a29a029e0d019cc6ecde9728b3f
|
|
| MD5 |
5a8223dea11413f92b948ba3debf4fa2
|
|
| BLAKE2b-256 |
e1bebfd34ea6aa657e6f734ac8f711e7ca1ebf4c5245190f5b31c2d3a53d360a
|