Add your description here
Project description
cdef-converter
cdef-converter is a Python CLI tool that converts CSV files to Parquet format efficiently.
Features
- Convert multiple CSV files to Parquet format in parallel
- Detect file encoding automatically
- Generate summary of processed files
- Progress tracking with rich console output
- Real-time status updates and summary display
Installation
pip install cdef-converter
Usage
cdef-converter /path/to/input/directory --processes 4
Options
input_directory
: Path to the directory containing CSV files (required)output_directory
: Path to the directory where Parquet files will be saved (default:./registers
)--processes
: Number of processes to use for parallel processing (default: 4)--encoding-chunk-size
: Chunk size in MB for encoding detection (default: 1 MB)--recursive
: Recursively search for files in subdirectories
Output
- Parquet files are saved in
/path/to/your/fixed/output/directory/registers
- A summary JSON file is generated at
register_summary.json
License
This project is licensed under the MIT License - see the LICENSE file for details.
Recent Changes
Real-time Status and Summary Display
- Implemented a live updating display with Rich library
- Added a status table showing current file processing status for each process
- Included a summary table that updates in real-time as files are processed
- Added a progress bar showing overall completion status
- Displayed the most recent log message below the progress bar
Encoding Chunk Size Modification
- The default encoding chunk size is now 1 MB
- Users can specify the encoding chunk size in MB using the
--encoding-chunk-size
option - Internally, the program converts the MB value to bytes for processing
Error Handling Improvements
- Enhanced exception handling throughout the codebase
- Added more specific error messages for common issues like file not found and permission errors
Type Hinting Updates
- Updated type hints to be compatible with Python 3.12
- Added missing type annotations to functions and variables
Code Structure and Style
- Improved code organization and readability
- Added or updated docstrings for better function documentation
Performance Optimization
- Implemented dynamic chunk size adjustment for very large files in the encoding detection process
Recursive File Processing
- Added a
--recursive
option to search for files in subdirectories
Logging Enhancements
- Implemented a custom logging setup with both console and file output
- Added rich formatting to console log messages
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdef_converter-0.4.0.tar.gz
(10.7 kB
view details)
Built Distribution
File details
Details for the file cdef_converter-0.4.0.tar.gz
.
File metadata
- Download URL: cdef_converter-0.4.0.tar.gz
- Upload date:
- Size: 10.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f5d87ed86ba6cd68c4aaa2e0e1bc378c1280fb79147cd106d66d4944eea73950 |
|
MD5 | e126fedbfcea1586fd89e4231c860fc6 |
|
BLAKE2b-256 | 8d8efc27c13bab1806ea370a1ab11e7ad77ffb55d4161b01f291627e06f5bae4 |
File details
Details for the file cdef_converter-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: cdef_converter-0.4.0-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2c6bb0810beadf0316169aaa5a88b3a7c7da6fdb629c306e48f475497ad1b29f |
|
MD5 | 490860050f90adb92233863bad9eca28 |
|
BLAKE2b-256 | c13df2e7ffb7dcf720d78de10e7ce93bdb723ac51638a8d33ad90d548d9d70aa |