Skip to main content

Utility for converting CSV to Parquet files

Project description

cdef-utils

cdef-utils is a Python package designed to convert CSV and Parquet files to a standardized Parquet format, specifically tailored for processing register data. It provides utilities for batch processing files, generating summaries, and handling various encoding issues.

Features

  • Convert CSV and Parquet files to a standardized Parquet format
  • Automatic encoding detection for CSV files
  • Batch processing of multiple files
  • Generation of summary reports
  • Progress tracking and resumable processing
  • Rich console output with logging

Installation

To install cdef-utils, you can use pip:

pip install cdef-utils

Usage

You can use cdef-utils as a command-line tool:

python -m cdef_utils /path/to/input/directory --summary_file output_summary.json

Arguments

  • input_directory: Path to the directory containing CSV and Parquet files to process
  • --summary_file: (Optional) Path to save the summary JSON file (default: "register_summary.json")

Output

The script will:

  1. Convert all CSV and Parquet files in the input directory to Parquet format
  2. Save the converted files in a structured directory format under /path/to/your/fixed/output/directory/registers
  3. Generate a summary JSON file with details about each processed register
  4. Display a summary table in the console
  5. Log processing details and any errors

Requirements

  • Python 3.7+
  • polars
  • rich

Configuration

  • The OUTPUT_DIRECTORY is set to /path/to/your/fixed/output/directory in the script. Modify this path as needed.
  • Logging is configured to save logs in a logs directory in the current working directory.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cdef_utils-2.6.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

cdef_utils-2.6.0-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file cdef_utils-2.6.0.tar.gz.

File metadata

  • Download URL: cdef_utils-2.6.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_utils-2.6.0.tar.gz
Algorithm Hash digest
SHA256 0db54a3b32260ea19a3eb113abf94cf9240f2db46c772d05eaf7e91e14179a5f
MD5 e68ce30a2f2bab9b6db186f8f17be633
BLAKE2b-256 b0ff64421c400598b19fde643e8a3fb180678c2dffab478561c6fe7a2a1e30b9

See more details on using hashes here.

File details

Details for the file cdef_utils-2.6.0-py3-none-any.whl.

File metadata

  • Download URL: cdef_utils-2.6.0-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.6

File hashes

Hashes for cdef_utils-2.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fded2c83f5c98f0a0e5368165e67a9c640369fc0ef20b74218ce053b4374614f
MD5 d6d98665e20bde18989cf86a03a8db84
BLAKE2b-256 2868ed30c74d1fe255c1183f3f8c0f6acefa0b195794eb9313bf2cd8e0285afe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page