A Python library to convert COBOL ebcdic file to parquet format based on copybook
Project description
pycobol2parquet
pycobol2parquet is a Python library to convert COBOL ebcdic file to parquet format.
I released pycobol2csv back in 2021 and it has been deployed to multiple production systems. One feedback I received is about the possibility of converting from Cobol to Parquet directly for analytical workload.
It is straightforward to reuse the same underline knowledge and code to generate Parquet file.
Install the python module:
pip install pycobol2parquet
To use the module:
from pycobol2parquet import convert_cobol_file, decode_copybook_file
row_length, cobol_struc = decode_copybook_file(copybook_file)
convert_cobol_file(copybook_file, data_file, output_file, codepage, debug=False)
- copybook_file: copybook filename
- data_file: data filename
- output_file: output parquet filename
- codepage: codepage for edibic, refer to https://docs.python.org/3.7/library/codecs.html#standard-encodings for details
- debug: enable for more debug information, default is OFF
Please refer to convert_cobol_test_main.py for details.
test
2 sets of test data have been created from scratch. Each set includes a copybook and an EBCDIC data file.
To test:
python convert_cobol_test_main.py --copybook testdata\test2\DWSTUB.txt --data testdata\test2\DWSTUB_DATA.DAT --output DWSTUB_DATA_output.parquet
known issues and limitations
- Be aware of the resources available in your runtime environment and make sure the Cobol file size is not beyond the limit or cause any performance issue.
To handle large Cobol files, you can split the files into smaller chunks and then process the chunks in parallel. Please refer to the medium post for details.
- When creating Parquet files the library detects data type automatically. This is to simplify the parameters passed to the conversion function.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file pycobol2parquet-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: pycobol2parquet-0.0.3-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0129a9fe9b2ade8f8cfc622357291938a0ec0c90f909bf51eaf7dd0e894f8f9 |
|
MD5 | 45e39a335c00d3f7a21f6fde28f1b484 |
|
BLAKE2b-256 | 55aed75cb1cf180cb1d127c01c52faca752538fa695addb9633d0ef009568201 |