Skip to main content

Extract PDS members from IEBPTPCH output files with support for both ASCII and EBCDIC formats

Project description

IEBPTPCH PDS Extractor

PyPI version PyPI Downloads Python versions License: MIT

A command line utility and Python library to extract PDS members from IEBPTPCH output files. This tool can handle both ASCII and EBCDIC formatted input files and convert EBCDIC content to ASCII (UTF-8) during extraction.

Overview

This utility processes output files created by the IBM IEBPTPCH utility, which converts Partitioned Data Sets (PDS) to sequential files. The typical workflow is:

  1. Create IEBPTPCH output using JCL (see Creating IEBPTPCH Output)
  2. Transfer the file from mainframe to your local system
  3. Extract individual members using this Python utility

Why Use This Tool Instead of FTP Clients?

While FTP clients like FileZilla can transfer mainframe files and convert EBCDIC to ASCII, this tool offers key advantages for mainframe migration projects:

🔄 Migration-Critical Benefits

  • One-Time Binary Transfer: Transfer the IEBPTPCH file once in binary mode, then perform multiple EBCDIC-to-ASCII conversions locally without re-transferring from mainframe
  • Encoding Preservation: Mainframe source code often contains hard-coded special characters that require precise encoding conversion - if the wrong encoding is used, you can verify against the original EBCDIC file locally without asking customers to re-transfer or check the mainframe again
  • Multiple Encoding Support: Supports 25+ EBCDIC code pages with automatic fallback for better compatibility
  • Individual Member Extraction: Extracts each PDS member as a separate file with proper member names, rather than a single large file

⚡ Automation Benefits

  • Scriptable: Command-line interface and Python API for integration into migration pipelines
  • File Extensions: Add appropriate extensions (.jcl, .cbl, .asm, etc.) for better file organization
  • Batch Processing: Process entire libraries without manual intervention

💼 Migration Efficiency

  • Reduced Mainframe Load: Minimize mainframe resource usage and connect time
  • Faster Iteration: Test different encodings and processing options locally
  • Cost Efficiency: Reduce mainframe costs during migration projects

💡 Best Practice: Use this tool when migrating mainframe source code to ensure accurate encoding conversion and efficient member extraction.

Installation

From PyPI (Recommended)

pip install iebptpch-pds-extractor

From Source

git clone https://github.com/arunkumars-mf/iebptpch-pds-extractor.git
cd iebptpch-pds-extractor
pip install .

Development Installation

git clone https://github.com/arunkumars-mf/iebptpch-pds-extractor.git
cd iebptpch-pds-extractor
pip install -e .

Creating IEBPTPCH Output

Use this JCL to convert your PDS to a sequential file suitable for this extractor:

//PDSEXTJ JOB 'PDS 2 PS',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*
//IEBPTPCH EXEC PGM=IEBPTPCH
//*
//SYSUT1 DD DISP=SHR,DSN=<YOUR.SOURCE.LIBRARY>
//*
//SYSUT2 DD DSN=<YOUR.SOURCE.LIBRARY.PS>,
//          DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
//          SPACE=(CYL,(5,5),RLSE)
//*
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
 PUNCH TYPORG=PO
/*

Replace:

  • <YOUR.SOURCE.LIBRARY> with your actual PDS name
  • <YOUR.SOURCE.LIBRARY.PS> with your desired output dataset name

Notes:

  • The PUNCH TYPORG=PO control statement tells IEBPTPCH to process a partitioned dataset
  • The output file will contain all PDS members with member name headers
  • Transfer this output file to your local system for processing with this Python utility

Features

  • Extract individual PDS members from IEBPTPCH output files
  • Support for both ASCII and EBCDIC input formats
  • Automatic format detection with manual override option
  • Configurable EBCDIC encoding (default: cp037) with automatic fallback to alternative encodings
  • Add custom file extensions to extracted members
  • Customizable member name detection pattern with multiple fallback patterns
  • Support for logical record length (LRECL) processing
  • Robust error handling and encoding fallback mechanisms
  • Multiple member name detection patterns for improved compatibility
  • Both command-line interface and Python API
  • Cross-platform compatibility (Windows, macOS, Linux)

Command Line Usage

After installation, the iebptpch-pds-extractor command will be available:

iebptpch-pds-extractor -i INPUT_FILE -o OUTPUT_DIRECTORY [options]

Required Arguments

  • -i, --input: Input IEBPTPCH output file path
  • -o, --output: Output directory for extracted PDS members

Optional Arguments

  • -f, --format: Input file format (ascii or ebcdic, default: ascii)
  • -e, --extension: File extension to add to extracted members (without dot)
  • -d, --delimiter: Regular expression pattern to identify member names (default: MEMBER\s+NAME\s+(\S+))
  • -c, --encoding: EBCDIC encoding to use for conversion (default: cp037, only used when format is ebcdic)
  • -l, --lrecl: Logical record length (default: 81, which is 80 + 1 for the first character)
  • -v, --verbose: Enable verbose output

Examples

Basic Usage

Extract members from an ASCII file:

iebptpch-pds-extractor -i input.txt -o output_dir

EBCDIC Input

Extract members from an EBCDIC file:

iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic

Add File Extensions

Extract members and add file extensions based on content type:

JCL Files

iebptpch-pds-extractor -i JCL_LIBRARY.txt -o output_dir -e jcl

COBOL Source Files

iebptpch-pds-extractor -i COBOL_LIBRARY.txt -o output_dir -e cbl

Assembler Source Files

iebptpch-pds-extractor -i ASM_LIBRARY.txt -o output_dir -e asm

Other File Types

# Procedures
iebptpch-pds-extractor -i PROC_LIBRARY.txt -o output_dir -e proc

# PL/I Source Files
iebptpch-pds-extractor -i PLI_LIBRARY.txt -o output_dir -e pli

# REXX Scripts
iebptpch-pds-extractor -i REXX_LIBRARY.txt -o output_dir -e rexx

# Include Files
iebptpch-pds-extractor -i INCLUDE_LIBRARY.txt -o output_dir -e inc

Advanced Options

Custom EBCDIC encoding:

iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic -c cp500

Custom delimiter pattern:

iebptpch-pds-extractor -i input.txt -o output_dir -d "^MEMBER:\s+(\S+)"

Custom LRECL:

iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic -l 133

Combining options:

iebptpch-pds-extractor -i COBOL_LIBRARY.txt -o output_dir -f ebcdic -e cbl -l 133 -v

Python API Usage

You can also use the extractor programmatically:

from iebptpch_pds_extractor import PDSExtractor

# Create extractor instance
extractor = PDSExtractor(
    input_file="path/to/input.txt",
    output_dir="path/to/output",
    file_format="ascii",  # or "ebcdic"
    extension="jcl",      # optional file extension
    verbose=True
)

# Extract members
member_count = extractor.extract()
print(f"Extracted {member_count} members")

API Parameters

  • input_file (str): Path to the input IEBPTPCH output file
  • output_dir (str): Directory where extracted members will be saved
  • file_format (str): Input file format ('ascii' or 'ebcdic', default: 'ascii')
  • extension (str): File extension to add to extracted members (default: '')
  • delimiter (str): Regular expression pattern to identify member names
  • encoding (str): EBCDIC encoding to use for conversion (default: 'cp037')
  • lrecl (int): Logical record length (default: 81)
  • verbose (bool): Enable verbose output (default: False)

Supported EBCDIC Encodings

Common EBCDIC Encodings

  • cp037 - IBM EBCDIC US/Canada (default)
  • cp500 - IBM EBCDIC International
  • cp1047 - IBM EBCDIC Latin-1/Open Systems

Country-specific EBCDIC Encodings

  • cp273 - IBM EBCDIC Germany
  • cp277 - IBM EBCDIC Denmark/Norway
  • cp278 - IBM EBCDIC Finland/Sweden
  • cp280 - IBM EBCDIC Italy
  • cp284 - IBM EBCDIC Spain
  • cp285 - IBM EBCDIC UK
  • cp297 - IBM EBCDIC France
  • And many more...

For a complete list, see the Python codecs documentation.

Requirements

  • Python 3.6 or higher
  • No external dependencies required (uses standard library only)

How It Works

  1. The script reads the input file in binary mode
  2. If the format is EBCDIC, it converts each line to ASCII using the specified encoding
  3. It processes the content based on the specified LRECL (logical record length)
  4. It identifies member names using the provided delimiter pattern
  5. For each member, it creates a new file in the output directory
  6. Content lines are written to the appropriate member file, with the first character (carriage control) removed

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for version history and changes.

Citation

If you use this software in your research or project, please cite it as:

@software{selvam_iebptpch_pds_extractor_2024,
  author = {Selvam, Arunkumar},
  title = {IEBPTPCH PDS Extractor},
  url = {https://github.com/arunkumars-mf/iebptpch-pds-extractor},
  version = {1.0.2},
  year = {2024}
}

APA Style: Selvam, A. (2024). IEBPTPCH PDS Extractor (Version 1.0.2) [Computer software]. https://github.com/arunkumars-mf/iebptpch-pds-extractor

IEEE Style: A. Selvam, "IEBPTPCH PDS Extractor," Version 1.0.2, 2024. [Online]. Available: https://github.com/arunkumars-mf/iebptpch-pds-extractor

Support

Related Projects

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iebptpch_pds_extractor-1.0.2.tar.gz (20.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

iebptpch_pds_extractor-1.0.2-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file iebptpch_pds_extractor-1.0.2.tar.gz.

File metadata

  • Download URL: iebptpch_pds_extractor-1.0.2.tar.gz
  • Upload date:
  • Size: 20.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for iebptpch_pds_extractor-1.0.2.tar.gz
Algorithm Hash digest
SHA256 a5251d7467c286071700b82182d38b24470402a9621ebedc90c962dc403503e2
MD5 50ed5f96e36cb1162d5cc0d0e2e34859
BLAKE2b-256 ba932917ec6ac4b59730e8f6fb0865c38c120ccb308373a466808447403c9d81

See more details on using hashes here.

File details

Details for the file iebptpch_pds_extractor-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for iebptpch_pds_extractor-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e15cd122f15337e42c2d323d1fc56acbb66aba09644e3aff4e0691ea16c50753
MD5 9fdd3af7aaa1285327b3dd9e643f846d
BLAKE2b-256 fd774f0e1bc2f4fb7cf4249bfb4d7ac3a979923c190aded3c7a50b0a66070081

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page