Skip to main content

A Python library to convert COBOL ebcdic file to parquet format based on copybook

Project description

pycobol2parquet

pycobol2parquet is a Python library to convert COBOL ebcdic file to parquet format.

I released pycobol2csv back in 2021 and it has been deployed to multiple production systems. One feedback I received is about the possibility of converting from Cobol to Parquet directly for analytical workload.

It is straightforward to reuse the same underline knowledge and code to generate Parquet file.

Install the python module:

pip install pycobol2parquet

To use the module:

from pycobol2parquet import convert_cobol_file, decode_copybook_file

row_length, cobol_struc = decode_copybook_file(copybook_file)

convert_cobol_file(copybook_file, data_file, output_file, codepage, debug=False)

Please refer to convert_cobol_test_main.py for details.

test

2 sets of test data have been created from scratch. Each set includes a copybook and an EBCDIC data file.

To test:

python convert_cobol_test_main.py --copybook testdata\test2\DWSTUB.txt --data testdata\test2\DWSTUB_DATA.DAT --output DWSTUB_DATA_output.parquet

known issues and limitations

  • Be aware of the resources available in your runtime environment and make sure the Cobol file size is not beyond the limit or cause any performance issue.

To handle large Cobol files, you can split the files into smaller chunks and then process the chunks in parallel. Please refer to the medium post for details.

  • When creating Parquet files the library detects data type automatically. This is to simplify the parameters passed to the conversion function.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

pycobol2parquet-0.0.3-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file pycobol2parquet-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pycobol2parquet-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e0129a9fe9b2ade8f8cfc622357291938a0ec0c90f909bf51eaf7dd0e894f8f9
MD5 45e39a335c00d3f7a21f6fde28f1b484
BLAKE2b-256 55aed75cb1cf180cb1d127c01c52faca752538fa695addb9633d0ef009568201

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page