Utility for converting CSV to Parquet files
Project description
cdef-utils
cdef-utils is a Python package designed to convert CSV and Parquet files to a standardized Parquet format, specifically tailored for processing register data. It provides utilities for batch processing files, generating summaries, and handling various encoding issues.
Features
- Convert CSV and Parquet files to a standardized Parquet format
- Automatic encoding detection for CSV files
- Batch processing of multiple files
- Generation of summary reports
- Progress tracking and resumable processing
- Rich console output with logging
Installation
To install cdef-utils, you can use pip:
pip install cdef-utils
Usage
You can use cdef-utils as a command-line tool:
python -m cdef_utils /path/to/input/directory --summary_file output_summary.json
Arguments
input_directory
: Path to the directory containing CSV and Parquet files to process--summary_file
: (Optional) Path to save the summary JSON file (default: "register_summary.json")
Output
The script will:
- Convert all CSV and Parquet files in the input directory to Parquet format
- Save the converted files in a structured directory format under
/path/to/your/fixed/output/directory/registers
- Generate a summary JSON file with details about each processed register
- Display a summary table in the console
- Log processing details and any errors
Requirements
- Python 3.7+
- polars
- rich
Configuration
- The
OUTPUT_DIRECTORY
is set to/path/to/your/fixed/output/directory
in the script. Modify this path as needed. - Logging is configured to save logs in a
logs
directory in the current working directory.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cdef_utils-2.4.0.tar.gz
.
File metadata
- Download URL: cdef_utils-2.4.0.tar.gz
- Upload date:
- Size: 65.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4ac3a678abd1c43a5a15643604f001d199d3b5879c357f8743e056ea7896d523 |
|
MD5 | 20b7e6cbcecd47d1ead8b7c082d34eed |
|
BLAKE2b-256 | 4293bc4a113ceefdbe4711581b5138ca73cd5a21817e678f7bee9dd9a5bc3434 |
File details
Details for the file cdef_utils-2.4.0-py3-none-any.whl
.
File metadata
- Download URL: cdef_utils-2.4.0-py3-none-any.whl
- Upload date:
- Size: 12.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 796f1e8f3be300b2c3d857bc1067d98458cf4269324335ae51748f66ca4eabc9 |
|
MD5 | 99441e58b62b98f232fd29674f5de08d |
|
BLAKE2b-256 | c790f5531371ba6e0aaeb72379e2be734c8db03bb00d871e4ecb85f85887aca4 |