Skip to main content

A parser for fixed-format 80-column COBOL copybooks.

Project description

python-copybook

PyPI Python License

A Python library for parsing fixed-format COBOL copybook files into structured, traversable data models.

Overview

python-copybook reads .cpy and .cbl copybook files and produces a clean hierarchy of parsed COBOL data fields. It handles multi-line statements, continuation lines, inline comments, REDEFINES, OCCURS, RENAMES, and all standard data definition clauses.

The library is structured as a five-stage pipeline:

File → CobolLine → CobolStatement → CobolField → CobolNode tree → rendered output

Each stage is independent and produces typed objects, so you can stop at any layer depending on what you need.

Current Scope

This library currently supports flat, self-contained copybooks only.

  • Fixed-format 80-column COBOL records
  • No COPY statement resolution — copybooks with imports are not yet supported
  • Offset, length, and position calculation is in progress
  • Buffer and memory view generation (mapping a raw data record to named fields) is planned

Installation

pip install python-copybook

Requirements

Python 3.12+

Quick Start

from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree, build_flat_tree
from copybook.render import render_copybook, render_flat_copybook

# Stage 1 — read physical lines
lines = read_copybook("CLAIMREC.cpy")

# Stage 2 — assemble logical statements
statements = read_statements(lines)

# Stage 3 — parse fields from statements
fields = parse_fields(statements)

# Stage 4 — build hierarchy tree
tree = build_tree(fields)

# Stage 5 — render output
print(render_copybook(tree))

Pipeline Stages

Stage 1 — read_copybook(path) → list[CobolLine]

Reads the file and slices each physical line into its fixed-format column regions. Lines are stored raw — no interpretation occurs at this stage.

Columns 1–6   — Sequence number
Column  7     — Indicator  (* = comment, - = continuation, space = normal)
Columns 8–72  — Area  (actual COBOL content)
Columns 73–80 — Identification (free-form comment)

Stage 2 — read_statements(lines) → list[CobolStatement]

Joins physical lines into complete logical declarations. In fixed-format COBOL, a declaration ends with a period and may span multiple lines. This stage handles:

  • Full-line comments (* or / indicator)
  • Continuation lines (- indicator)
  • Inline comments (*> syntax, COBOL 2002+)

Stage 3 — parse_fields(statements) → list[CobolField]

Extracts every COBOL clause from each statement using independent regex searches. Clause order in the source does not matter.

Clauses parsed: PIC/PICTURE, REDEFINES, RENAMES, RENAMES THRU, USAGE, OCCURS, DEPENDING ON, INDEXED BY, VALUE/VALUES ARE, SIGN SEPARATE, SYNCHRONIZED

Stage 4 — build_tree(fields) → list[CobolNode]

Assigns parent-child relationships using level numbers. Returns root-level nodes; all others are reachable via .children.

Two tree views are available:

Function Contents
build_tree(fields) Full hierarchy — mirrors source exactly
build_flat_tree(fields) Elementary fields only — no groups, no 88s, no REDEFINES

Stage 5 — Rendering

Function Output
render_copybook(tree) Full copybook, original level numbers preserved
render_flat_copybook(nodes, record_name) All fields remapped to level 05 under one 01 record

render_copybook accepts an indent_size parameter (default 4) to control visual depth:

render_copybook(tree, indent_size=0)   # flat — all fields at Area B
render_copybook(tree, indent_size=4)   # default — 4 spaces per depth level

Data Models

CobolLine

One physical line from the file. Never mutated after creation.

line.line_number      # int — 1-based
line.indicator        # str — single char, space = normal
line.area             # str — raw columns 8–72
line.is_comment       # bool — indicator is * or /
line.is_continuation  # bool — indicator is -
line.raw_line         # str — reconstructed 80-char record

CobolStatement

One complete logical declaration, potentially spanning multiple lines.

stmt.area             # str — full joined text e.g. '15 Field PIC X(10).'
stmt.source_lines     # list[CobolLine] — contributing physical lines
stmt.reserved_words   # list[str] — COBOL keywords found in this statement
stmt.copybook         # str — source file path

CobolField

One parsed data field with all clauses extracted.

field.level           # int  — level number
field.name            # str  — uppercased field name
field.picture         # str | None  — e.g. '9(07)', 'X(15)', 'S9(7)V99'
field.redefines       # str | None  — target field name
field.usage           # str | None  — e.g. 'COMP-3', 'BINARY', 'DISPLAY'
field.occurs          # int | None  — fixed array size
field.occurs_max      # int | None  — upper bound for variable arrays
field.depending_on    # str | None  — field holding current array size
field.indexed_by      # str | None  — index name for OCCURS tables
field.value           # str | None  — VALUE clause content
field.sign_separate   # bool — SIGN LEADING/TRAILING SEPARATE present
field.synchronized    # bool — SYNC/SYNCHRONIZED present

# Derived properties
field.is_filler         # name == 'FILLER'
field.is_condition      # level == 88
field.is_group          # no PIC clause
field.is_redefine       # has REDEFINES clause
field.is_array          # has OCCURS clause
field.is_variable_array # has DEPENDING ON clause
field.is_numeric        # PIC contains 9
field.is_signed         # PIC starts with S

CobolNode

A node in the field hierarchy tree. Wraps a CobolField with parent/child references.

node.cobol_field      # CobolField
node.parent           # CobolNode | None
node.children         # list[CobolNode]
node.level            # shortcut to cobol_field.level
node.name             # shortcut to cobol_field.name
node.to_dict()        # JSON string — parent serialized as name string to avoid circular refs

Examples

Parse and print a copybook tree

from copybook.parser import read_copybook, read_statements, parse_fields
from copybook.tree import build_tree
from copybook.render import render_copybook

lines      = read_copybook("CLAIMREC.cpy")
statements = read_statements(lines)
fields     = parse_fields(statements)
tree       = build_tree(fields)

print(render_copybook(tree, indent_size=4))
       01  Claim-Record.
           12 Insured-Details.
               15 Insured-Policy-No        PIC 9(07).
               15 Insured-Last-Name        PIC X(15).
           12 Policy-Details.
               15 Policy-Type              PIC 9.
                   88 Private              VALUE 1.
                   88 Medicare             VALUE 2.

Generate a flat copybook (no groups, no REDEFINES)

from copybook.tree import build_flat_tree
from copybook.render import render_flat_copybook

flat = build_flat_tree(fields)
print(render_flat_copybook(flat, "FLAT-CLAIMREC"))
       01  FLAT-CLAIMREC.
           05 Insured-Policy-No        PIC 9(07).
           05 Insured-Last-Name        PIC X(15).
           05 Insured-First-Name       PIC X(10).
           05 Policy-Type              PIC 9.
           05 Policy-Benefit-Date-Num  PIC 9(08).
           05 Policy-Amount            PIC S9(7)V99.

Inspect a field

for field in fields:
    if field.is_numeric:
        print(f"{field.name}: PIC {field.picture}")

Serialize a tree node to JSON

tree = build_tree(fields)
print(tree[0].to_dict())

Project Structure

copybook/
    models.py          — CobolLine, CobolStatement, CobolField, CobolNode
    patterns.py        — regex patterns, layout constants, reserved words
    parser.py          — read_copybook, read_statements, parse_fields
    tree.py            — build_tree, build_flat_tree
    render.py          — render_copybook, render_flat_copybook
    reserved_words.txt — COBOL reserved word list

Supported COBOL Features

Feature Supported
Fixed-format 80-column records
Full-line comments (*, / indicator)
Continuation lines (- indicator)
Inline comments (*>)
Multi-line statements
PIC / PICTURE clause
REDEFINES
OCCURS n TIMES
OCCURS n TO m TIMES DEPENDING ON
INDEXED BY
VALUE / VALUES ARE
USAGE / COMP / COMP-3 / BINARY etc.
SIGN LEADING/TRAILING SEPARATE
SYNCHRONIZED / SYNC
RENAMES / RENAMES THRU
Level 66, 77, 88
:tag: style replacement tokens
Free-format COBOL

Roadmap

  • PIC clause byte length calculator (DISPLAY, COMP-3, BINARY, etc.)
  • Field offset and position computation
  • REDEFINES-aware offset handling (overlapping storage)
  • OCCURS multiplier in offset calculation
  • Raw record buffer slicing by field name
  • Multiple memory view strategies (full tree, flat, storage-only)
  • COPY statement resolution across multiple copybook files

License

GPL-3.0 — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python_copybook-0.1.2.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_copybook-0.1.2-py3-none-any.whl (38.3 kB view details)

Uploaded Python 3

File details

Details for the file python_copybook-0.1.2.tar.gz.

File metadata

  • Download URL: python_copybook-0.1.2.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.10 Darwin/25.1.0

File hashes

Hashes for python_copybook-0.1.2.tar.gz
Algorithm Hash digest
SHA256 286fed7da99a55a97102b2fe4347dfb7bd67c6a2b30c7b6cb41318ca05353f99
MD5 19845ded4ff99d937822285ddfa7d3bb
BLAKE2b-256 20a0359eb2f75103fdb938bf7fb371d3322d0b5172217e5002e903f966ee5450

See more details on using hashes here.

File details

Details for the file python_copybook-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: python_copybook-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 38.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.10 Darwin/25.1.0

File hashes

Hashes for python_copybook-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 aa11cb2f409cd28620c8e9e6a310828d36668a29a029e0d019cc6ecde9728b3f
MD5 5a8223dea11413f92b948ba3debf4fa2
BLAKE2b-256 e1bebfd34ea6aa657e6f734ac8f711e7ca1ebf4c5245190f5b31c2d3a53d360a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page