Skip to main content

Add your description here

Project description

cdef-converter

cdef-converter is a Python CLI tool that converts CSV files to Parquet format efficiently.

Features

  • Convert multiple CSV files to Parquet format in parallel
  • Detect file encoding automatically
  • Generate summary of processed files
  • Progress tracking with rich console output
  • Real-time status updates and summary display

Installation

pip install cdef-converter

Usage

cdef-converter /path/to/input/directory --processes 4

Options

  • input_directory: Path to the directory containing CSV files (required)
  • output_directory: Path to the directory where Parquet files will be saved (default: ./registers)
  • --processes: Number of processes to use for parallel processing (default: 4)
  • --encoding-chunk-size: Chunk size in MB for encoding detection (default: 1 MB)
  • --recursive: Recursively search for files in subdirectories

Output

  • Parquet files are saved in /path/to/your/fixed/output/directory/registers
  • A summary JSON file is generated at register_summary.json

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recent Changes

Real-time Status and Summary Display

  • Implemented a live updating display with Rich library
  • Added a status table showing current file processing status for each process
  • Included a summary table that updates in real-time as files are processed
  • Added a progress bar showing overall completion status
  • Displayed the most recent log message below the progress bar

Encoding Chunk Size Modification

  • The default encoding chunk size is now 1 MB
  • Users can specify the encoding chunk size in MB using the --encoding-chunk-size option
  • Internally, the program converts the MB value to bytes for processing

Error Handling Improvements

  • Enhanced exception handling throughout the codebase
  • Added more specific error messages for common issues like file not found and permission errors

Type Hinting Updates

  • Updated type hints to be compatible with Python 3.12
  • Added missing type annotations to functions and variables

Code Structure and Style

  • Improved code organization and readability
  • Added or updated docstrings for better function documentation

Performance Optimization

  • Implemented dynamic chunk size adjustment for very large files in the encoding detection process

Recursive File Processing

  • Added a --recursive option to search for files in subdirectories

Logging Enhancements

  • Implemented a custom logging setup with both console and file output
  • Added rich formatting to console log messages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdef_converter-0.5.0.tar.gz (10.4 kB view details)

Uploaded Source

Built Distribution

cdef_converter-0.5.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file cdef_converter-0.5.0.tar.gz.

File metadata

  • Download URL: cdef_converter-0.5.0.tar.gz
  • Upload date:
  • Size: 10.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_converter-0.5.0.tar.gz
Algorithm Hash digest
SHA256 07d7c7fe95b3027b11d0241b0b07fef87035615ae0d39e19d0e755fa23d1ee01
MD5 5bb283791236130991815a13527638ea
BLAKE2b-256 09d6e50bbb6322dc4e719e8b25201b148b120bc5714ee05bffc72f6e3bf88c6c

See more details on using hashes here.

File details

Details for the file cdef_converter-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cdef_converter-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a901701ace7b2bce1db2adb8b9200b6c80aa4d6399a151460e77d9174ce77b7f
MD5 1dd1f57e7d166c26a2ad903830ddbdbf
BLAKE2b-256 9784c99bd7e68595b43b04da91b27f4aefadfa9a6f2fa7d4201868caa6ebb5fa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page