Skip to main content

Add your description here

Project description

cdef-converter

cdef-converter is a Python CLI tool that converts CSV files to Parquet format efficiently.

Features

  • Convert multiple CSV files to Parquet format in parallel
  • Detect file encoding automatically
  • Generate summary of processed files
  • Progress tracking with rich console output
  • Real-time status updates and summary display

Installation

pip install cdef-converter

Usage

cdef-converter /path/to/input/directory --processes 4

Options

  • input_directory: Path to the directory containing CSV files (required)
  • output_directory: Path to the directory where Parquet files will be saved (default: ./registers)
  • --processes: Number of processes to use for parallel processing (default: 4)
  • --encoding-chunk-size: Chunk size in MB for encoding detection (default: 1 MB)
  • --recursive: Recursively search for files in subdirectories

Output

  • Parquet files are saved in /path/to/your/fixed/output/directory/registers
  • A summary JSON file is generated at register_summary.json

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recent Changes

Real-time Status and Summary Display

  • Implemented a live updating display with Rich library
  • Added a status table showing current file processing status for each process
  • Included a summary table that updates in real-time as files are processed
  • Added a progress bar showing overall completion status
  • Displayed the most recent log message below the progress bar

Encoding Chunk Size Modification

  • The default encoding chunk size is now 1 MB
  • Users can specify the encoding chunk size in MB using the --encoding-chunk-size option
  • Internally, the program converts the MB value to bytes for processing

Error Handling Improvements

  • Enhanced exception handling throughout the codebase
  • Added more specific error messages for common issues like file not found and permission errors

Type Hinting Updates

  • Updated type hints to be compatible with Python 3.12
  • Added missing type annotations to functions and variables

Code Structure and Style

  • Improved code organization and readability
  • Added or updated docstrings for better function documentation

Performance Optimization

  • Implemented dynamic chunk size adjustment for very large files in the encoding detection process

Recursive File Processing

  • Added a --recursive option to search for files in subdirectories

Logging Enhancements

  • Implemented a custom logging setup with both console and file output
  • Added rich formatting to console log messages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdef_converter-0.4.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

cdef_converter-0.4.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file cdef_converter-0.4.0.tar.gz.

File metadata

  • Download URL: cdef_converter-0.4.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_converter-0.4.0.tar.gz
Algorithm Hash digest
SHA256 f5d87ed86ba6cd68c4aaa2e0e1bc378c1280fb79147cd106d66d4944eea73950
MD5 e126fedbfcea1586fd89e4231c860fc6
BLAKE2b-256 8d8efc27c13bab1806ea370a1ab11e7ad77ffb55d4161b01f291627e06f5bae4

See more details on using hashes here.

File details

Details for the file cdef_converter-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cdef_converter-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2c6bb0810beadf0316169aaa5a88b3a7c7da6fdb629c306e48f475497ad1b29f
MD5 490860050f90adb92233863bad9eca28
BLAKE2b-256 c13df2e7ffb7dcf720d78de10e7ce93bdb723ac51638a8d33ad90d548d9d70aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page