Extract PDS members from IEBPTPCH output files with support for both ASCII and EBCDIC formats
Project description
IEBPTPCH PDS Extractor
A command line utility and Python library to extract PDS members from IEBPTPCH output files. This tool can handle both ASCII and EBCDIC formatted input files and convert EBCDIC content to ASCII (UTF-8) during extraction.
Overview
This utility processes output files created by the IBM IEBPTPCH utility, which converts Partitioned Data Sets (PDS) to sequential files. The typical workflow is:
- Create IEBPTPCH output using JCL (see Creating IEBPTPCH Output)
- Transfer the file from mainframe to your local system
- Extract individual members using this Python utility
Why Use This Tool Instead of FTP Clients?
While FTP clients like FileZilla can transfer mainframe files and convert EBCDIC to ASCII, this tool offers key advantages for mainframe migration projects:
🔄 Migration-Critical Benefits
- One-Time Binary Transfer: Transfer the IEBPTPCH file once in binary mode, then perform multiple EBCDIC-to-ASCII conversions locally without re-transferring from mainframe
- Encoding Preservation: Mainframe source code often contains hard-coded special characters that require precise encoding conversion - if the wrong encoding is used, you can verify against the original EBCDIC file locally without asking customers to re-transfer or check the mainframe again
- Multiple Encoding Support: Supports 25+ EBCDIC code pages with automatic fallback for better compatibility
- Individual Member Extraction: Extracts each PDS member as a separate file with proper member names, rather than a single large file
⚡ Automation Benefits
- Scriptable: Command-line interface and Python API for integration into migration pipelines
- File Extensions: Add appropriate extensions (.jcl, .cbl, .asm, etc.) for better file organization
- Batch Processing: Process entire libraries without manual intervention
💼 Migration Efficiency
- Reduced Mainframe Load: Minimize mainframe resource usage and connect time
- Faster Iteration: Test different encodings and processing options locally
- Cost Efficiency: Reduce mainframe costs during migration projects
💡 Best Practice: Use this tool when migrating mainframe source code to ensure accurate encoding conversion and efficient member extraction.
Installation
From PyPI (Recommended)
pip install iebptpch-pds-extractor
From Source
git clone https://github.com/arunkumars-mf/iebptpch-pds-extractor.git
cd iebptpch-pds-extractor
pip install .
Development Installation
git clone https://github.com/arunkumars-mf/iebptpch-pds-extractor.git
cd iebptpch-pds-extractor
pip install -e .
Creating IEBPTPCH Output
Use this JCL to convert your PDS to a sequential file suitable for this extractor:
//PDSEXTJ JOB 'PDS 2 PS',CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*
//IEBPTPCH EXEC PGM=IEBPTPCH
//*
//SYSUT1 DD DISP=SHR,DSN=<YOUR.SOURCE.LIBRARY>
//*
//SYSUT2 DD DSN=<YOUR.SOURCE.LIBRARY.PS>,
// DISP=(NEW,CATLG,DELETE),UNIT=SYSDA,
// SPACE=(CYL,(5,5),RLSE)
//*
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
PUNCH TYPORG=PO
/*
Replace:
<YOUR.SOURCE.LIBRARY>with your actual PDS name<YOUR.SOURCE.LIBRARY.PS>with your desired output dataset name
Notes:
- The
PUNCH TYPORG=POcontrol statement tells IEBPTPCH to process a partitioned dataset - The output file will contain all PDS members with member name headers
- Transfer this output file to your local system for processing with this Python utility
Features
- Extract individual PDS members from IEBPTPCH output files
- Support for both ASCII and EBCDIC input formats
- Automatic format detection with manual override option
- Configurable EBCDIC encoding (default: cp037) with automatic fallback to alternative encodings
- Add custom file extensions to extracted members
- Customizable member name detection pattern with multiple fallback patterns
- Support for logical record length (LRECL) processing
- Robust error handling and encoding fallback mechanisms
- Multiple member name detection patterns for improved compatibility
- Both command-line interface and Python API
- Cross-platform compatibility (Windows, macOS, Linux)
Command Line Usage
After installation, the iebptpch-pds-extractor command will be available:
iebptpch-pds-extractor -i INPUT_FILE -o OUTPUT_DIRECTORY [options]
Required Arguments
-i, --input: Input IEBPTPCH output file path-o, --output: Output directory for extracted PDS members
Optional Arguments
-f, --format: Input file format (asciiorebcdic, default:ascii)-e, --extension: File extension to add to extracted members (without dot)-d, --delimiter: Regular expression pattern to identify member names (default:MEMBER\s+NAME\s+(\S+))-c, --encoding: EBCDIC encoding to use for conversion (default:cp037, only used when format isebcdic)-l, --lrecl: Logical record length (default: 81, which is 80 + 1 for the first character)-v, --verbose: Enable verbose output
Examples
Basic Usage
Extract members from an ASCII file:
iebptpch-pds-extractor -i input.txt -o output_dir
EBCDIC Input
Extract members from an EBCDIC file:
iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic
Add File Extensions
Extract members and add file extensions based on content type:
JCL Files
iebptpch-pds-extractor -i JCL_LIBRARY.txt -o output_dir -e jcl
COBOL Source Files
iebptpch-pds-extractor -i COBOL_LIBRARY.txt -o output_dir -e cbl
Assembler Source Files
iebptpch-pds-extractor -i ASM_LIBRARY.txt -o output_dir -e asm
Other File Types
# Procedures
iebptpch-pds-extractor -i PROC_LIBRARY.txt -o output_dir -e proc
# PL/I Source Files
iebptpch-pds-extractor -i PLI_LIBRARY.txt -o output_dir -e pli
# REXX Scripts
iebptpch-pds-extractor -i REXX_LIBRARY.txt -o output_dir -e rexx
# Include Files
iebptpch-pds-extractor -i INCLUDE_LIBRARY.txt -o output_dir -e inc
Advanced Options
Custom EBCDIC encoding:
iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic -c cp500
Custom delimiter pattern:
iebptpch-pds-extractor -i input.txt -o output_dir -d "^MEMBER:\s+(\S+)"
Custom LRECL:
iebptpch-pds-extractor -i input.txt -o output_dir -f ebcdic -l 133
Combining options:
iebptpch-pds-extractor -i COBOL_LIBRARY.txt -o output_dir -f ebcdic -e cbl -l 133 -v
Python API Usage
You can also use the extractor programmatically:
from iebptpch_pds_extractor import PDSExtractor
# Create extractor instance
extractor = PDSExtractor(
input_file="path/to/input.txt",
output_dir="path/to/output",
file_format="ascii", # or "ebcdic"
extension="jcl", # optional file extension
verbose=True
)
# Extract members
member_count = extractor.extract()
print(f"Extracted {member_count} members")
API Parameters
input_file(str): Path to the input IEBPTPCH output fileoutput_dir(str): Directory where extracted members will be savedfile_format(str): Input file format ('ascii' or 'ebcdic', default: 'ascii')extension(str): File extension to add to extracted members (default: '')delimiter(str): Regular expression pattern to identify member namesencoding(str): EBCDIC encoding to use for conversion (default: 'cp037')lrecl(int): Logical record length (default: 81)verbose(bool): Enable verbose output (default: False)
Supported EBCDIC Encodings
Common EBCDIC Encodings
cp037- IBM EBCDIC US/Canada (default)cp500- IBM EBCDIC Internationalcp1047- IBM EBCDIC Latin-1/Open Systems
Country-specific EBCDIC Encodings
cp273- IBM EBCDIC Germanycp277- IBM EBCDIC Denmark/Norwaycp278- IBM EBCDIC Finland/Swedencp280- IBM EBCDIC Italycp284- IBM EBCDIC Spaincp285- IBM EBCDIC UKcp297- IBM EBCDIC France- And many more...
For a complete list, see the Python codecs documentation.
Requirements
- Python 3.6 or higher
- No external dependencies required (uses standard library only)
How It Works
- The script reads the input file in binary mode
- If the format is EBCDIC, it converts each line to ASCII using the specified encoding
- It processes the content based on the specified LRECL (logical record length)
- It identifies member names using the provided delimiter pattern
- For each member, it creates a new file in the output directory
- Content lines are written to the appropriate member file, with the first character (carriage control) removed
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for version history and changes.
Citation
If you use this software in your research or project, please cite it as:
@software{selvam_iebptpch_pds_extractor_2024,
author = {Selvam, Arunkumar},
title = {IEBPTPCH PDS Extractor},
url = {https://github.com/arunkumars-mf/iebptpch-pds-extractor},
version = {1.0.2},
year = {2024}
}
APA Style: Selvam, A. (2024). IEBPTPCH PDS Extractor (Version 1.0.2) [Computer software]. https://github.com/arunkumars-mf/iebptpch-pds-extractor
IEEE Style: A. Selvam, "IEBPTPCH PDS Extractor," Version 1.0.2, 2024. [Online]. Available: https://github.com/arunkumars-mf/iebptpch-pds-extractor
Support
- Issues: GitHub Issues
- Documentation: Project Documentation
- Examples: Examples Directory
Related Projects
- COBOL Copybook to JSON - Convert COBOL copybooks to JSON schema format
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file iebptpch_pds_extractor-1.0.2.tar.gz.
File metadata
- Download URL: iebptpch_pds_extractor-1.0.2.tar.gz
- Upload date:
- Size: 20.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5251d7467c286071700b82182d38b24470402a9621ebedc90c962dc403503e2
|
|
| MD5 |
50ed5f96e36cb1162d5cc0d0e2e34859
|
|
| BLAKE2b-256 |
ba932917ec6ac4b59730e8f6fb0865c38c120ccb308373a466808447403c9d81
|
File details
Details for the file iebptpch_pds_extractor-1.0.2-py3-none-any.whl.
File metadata
- Download URL: iebptpch_pds_extractor-1.0.2-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e15cd122f15337e42c2d323d1fc56acbb66aba09644e3aff4e0691ea16c50753
|
|
| MD5 |
9fdd3af7aaa1285327b3dd9e643f846d
|
|
| BLAKE2b-256 |
fd774f0e1bc2f4fb7cf4249bfb4d7ac3a979923c190aded3c7a50b0a66070081
|