Skip to main content

Add your description here

Project description

cdef-converter

cdef-converter is a Python CLI tool that converts CSV files to Parquet format efficiently.

Features

  • Convert multiple CSV files to Parquet format in parallel
  • Detect file encoding automatically
  • Generate summary of processed files
  • Progress tracking with rich console output

Installation

pip install cdef-converter

Usage

cdef-converter /path/to/input/directory --processes 4

Options

  • input_directory: Path to the directory containing CSV files (required)
  • output_directory: Path to the directory where Parquet files will be saved (default: ./registers)
  • --processes: Number of processes to use for parallel processing (default: 4)
  • --encoding-chunk-size: Chunk size in MB for encoding detection (default: 1 MB)

Output

  • Parquet files are saved in /path/to/your/fixed/output/directory/registers
  • A summary JSON file is generated at register_summary.json

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recent Changes

Encoding Chunk Size Modification

We've updated the encoding chunk size option to use megabytes (MB) instead of kilobytes (KB) for easier user input:

  • The default encoding chunk size is now 1 MB.
  • Users can specify the encoding chunk size in MB using the --encoding-chunk-size option.
  • Internally, the program converts the MB value to bytes for processing.

Error Handling Improvements

  • Enhanced exception handling throughout the codebase.
  • Added more specific error messages for common issues like file not found and permission errors.

Type Hinting Updates

  • Updated type hints to be compatible with Python 3.12.
  • Replaced MPQueue with Queue from the queue module for better type checking.

Code Structure and Style

  • Improved code organization and readability.
  • Added or updated docstrings for better function documentation.

Performance Optimization

  • Implemented dynamic chunk size adjustment for very large files in the encoding detection process.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdef_converter-0.2.0.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

cdef_converter-0.2.0-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file cdef_converter-0.2.0.tar.gz.

File metadata

  • Download URL: cdef_converter-0.2.0.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_converter-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9e797942dc7e630b21463b7ed779470043222ca98b67233f483c658f777e85f5
MD5 755c391804fdfc7d3632253f4e5ac259
BLAKE2b-256 e32711967e73d09efd1d404a5fbaaec5c572362e9442b7a453558007793a088e

See more details on using hashes here.

File details

Details for the file cdef_converter-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cdef_converter-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2734562bdfe51e20986c4c2dd1fdf615e4b0e6b6e767f2c2b4454d0118cbd0dc
MD5 68f0098e3c633852f597d72ed50acd46
BLAKE2b-256 0ee6db94ca7334744c9fbb2bf090a676b1198cd8f955883e3e68485fb27623c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page