Skip to main content

Add your description here

Project description

cdef-converter

cdef-converter is a Python CLI tool that converts CSV files to Parquet format efficiently.

Features

  • Convert multiple CSV files to Parquet format in parallel
  • Detect file encoding automatically
  • Generate summary of processed files
  • Progress tracking with rich console output
  • Real-time status updates and summary display

Installation

pip install cdef-converter

Usage

cdef-converter /path/to/input/directory --processes 4

Options

  • input_directory: Path to the directory containing CSV files (required)
  • output_directory: Path to the directory where Parquet files will be saved (default: ./registers)
  • --processes: Number of processes to use for parallel processing (default: 4)
  • --encoding-chunk-size: Chunk size in MB for encoding detection (default: 1 MB)
  • --recursive: Recursively search for files in subdirectories

Output

  • Parquet files are saved in /path/to/your/fixed/output/directory/registers
  • A summary JSON file is generated at register_summary.json

License

This project is licensed under the MIT License - see the LICENSE file for details.

Recent Changes

Real-time Status and Summary Display

  • Implemented a live updating display with Rich library
  • Added a status table showing current file processing status for each process
  • Included a summary table that updates in real-time as files are processed
  • Added a progress bar showing overall completion status
  • Displayed the most recent log message below the progress bar

Encoding Chunk Size Modification

  • The default encoding chunk size is now 1 MB
  • Users can specify the encoding chunk size in MB using the --encoding-chunk-size option
  • Internally, the program converts the MB value to bytes for processing

Error Handling Improvements

  • Enhanced exception handling throughout the codebase
  • Added more specific error messages for common issues like file not found and permission errors

Type Hinting Updates

  • Updated type hints to be compatible with Python 3.12
  • Added missing type annotations to functions and variables

Code Structure and Style

  • Improved code organization and readability
  • Added or updated docstrings for better function documentation

Performance Optimization

  • Implemented dynamic chunk size adjustment for very large files in the encoding detection process

Recursive File Processing

  • Added a --recursive option to search for files in subdirectories

Logging Enhancements

  • Implemented a custom logging setup with both console and file output
  • Added rich formatting to console log messages

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdef_converter-0.3.0.tar.gz (10.6 kB view details)

Uploaded Source

Built Distribution

cdef_converter-0.3.0-py3-none-any.whl (11.5 kB view details)

Uploaded Python 3

File details

Details for the file cdef_converter-0.3.0.tar.gz.

File metadata

  • Download URL: cdef_converter-0.3.0.tar.gz
  • Upload date:
  • Size: 10.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_converter-0.3.0.tar.gz
Algorithm Hash digest
SHA256 4138263872a86a4f3ce95c15234bc132f4493804460ddf162a9c483e9e9594d0
MD5 a8e29a3def870bb17c9a21ddc76044a2
BLAKE2b-256 d66a479accc4a37166a273a18eb4a659881fdd65f7440d4523e556fa1c016529

See more details on using hashes here.

File details

Details for the file cdef_converter-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cdef_converter-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90ed09a60659850c43bf39580563f397dce4a7324eb89770894d6ae5b886f283
MD5 c2d21374ce6e71e2d511e421e146b9d7
BLAKE2b-256 b75df5e35859edaae3b97f16584204d4e07501984fbe7c2306e3f4cab40a3517

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page