Utility for converting CSV to Parquet files
Project description
cdef-utils
cdef-utils is a Python package designed to convert CSV and Parquet files to a standardized Parquet format, specifically tailored for processing register data. It provides utilities for batch processing files, generating summaries, and handling various encoding issues.
Features
- Convert CSV and Parquet files to a standardized Parquet format
- Automatic encoding detection for CSV files
- Batch processing of multiple files
- Generation of summary reports
- Progress tracking and resumable processing
- Rich console output with logging
Installation
To install cdef-utils, you can use pip:
pip install cdef-utils
Usage
You can use cdef-utils as a command-line tool:
python -m cdef_utils /path/to/input/directory --summary_file output_summary.json
Arguments
input_directory
: Path to the directory containing CSV and Parquet files to process--summary_file
: (Optional) Path to save the summary JSON file (default: "register_summary.json")
Output
The script will:
- Convert all CSV and Parquet files in the input directory to Parquet format
- Save the converted files in a structured directory format under
/path/to/your/fixed/output/directory/registers
- Generate a summary JSON file with details about each processed register
- Display a summary table in the console
- Log processing details and any errors
Requirements
- Python 3.7+
- polars
- rich
Configuration
- The
OUTPUT_DIRECTORY
is set to/path/to/your/fixed/output/directory
in the script. Modify this path as needed. - Logging is configured to save logs in a
logs
directory in the current working directory.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cdef_utils-1.0.0.tar.gz
(34.0 kB
view hashes)
Built Distribution
Close
Hashes for cdef_utils-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80cccbb7042d885bad59d0ad48f847a84bead84b09a8aa84303c0f3481fbc7f8 |
|
MD5 | d19aad489333219bee2ce24684583c9f |
|
BLAKE2b-256 | 448cc7f10ab920c3467c71976927ff6de12b07b4f73ffcf9157c338e0588b0ba |