Skip to main content

A command-line tool for converting between Parquet and CSV file formats

Project description

ParquetConv

A command-line tool for converting between Parquet and CSV file formats using pandas.

Features

  • Automatic format detection: Automatically detects whether the input file is Parquet or CSV
  • Bidirectional conversion: Convert Parquet to CSV or CSV to Parquet
  • Flexible output naming: Auto-generates output filenames or allows custom naming
  • Error handling: Comprehensive error handling with informative messages
  • Force conversion: Option to force conversion even with uncertain file formats

Installation

The project uses uv for dependency management. Install dependencies with:

uv sync

Usage

Basic Usage

Convert a Parquet file to CSV:

python main.py input.parquet

Convert a CSV file to Parquet:

python main.py input.csv

Advanced Usage

Specify a custom output filename:

python main.py input.parquet -o custom_output.csv
python main.py input.csv -o custom_output.parquet

Force conversion (useful when file format detection is uncertain):

python main.py input_file --force

Command Line Options

  • input_file: Path to the input file (required)
  • -o, --output: Custom output file path (optional)
  • --force: Force conversion even if file format detection is uncertain
  • -h, --help: Show help message

Examples

# Convert Parquet to CSV with auto-generated filename
python main.py data.parquet
# Output: data.csv

# Convert CSV to Parquet with custom filename
python main.py data.csv -o processed_data.parquet

# Convert with force flag
python main.py unknown_file --force

Requirements

  • Python 3.9+
  • pandas >= 2.3.2
  • pyarrow >= 21.0.0

How It Works

  1. File Detection: The tool first checks the file extension, then attempts to read the file to determine its format
  2. Format Conversion: Uses pandas to read the input file and convert it to the opposite format
  3. Output Generation: Creates the output file with an appropriate extension if not specified

Error Handling

The tool provides clear error messages for:

  • Missing input files
  • Unsupported file formats
  • Read/write errors during conversion
  • Invalid file content

License

This project is open source and available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquetconv-0.2.0.tar.gz (46.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parquetconv-0.2.0-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file parquetconv-0.2.0.tar.gz.

File metadata

  • Download URL: parquetconv-0.2.0.tar.gz
  • Upload date:
  • Size: 46.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.3

File hashes

Hashes for parquetconv-0.2.0.tar.gz
Algorithm Hash digest
SHA256 72d7d71ee1ac7d0f11dcb75aa23987389f292a9fc1d258f2474b3d4d77ec88a1
MD5 bc00522d3503ae3acd6e90aee1c087d6
BLAKE2b-256 9c8f4e187d6235f083ed147929717091f727a698d762d2dd0a35c5e390652bd9

See more details on using hashes here.

File details

Details for the file parquetconv-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for parquetconv-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c9919a688a5b864f028896ab8a86187f23eba5c814f657d2dece33bf7f787ba6
MD5 93ccdafd728008e2e182e26533dfa404
BLAKE2b-256 e08a67cdd3436661ff5327d566194c97e6b4d33975a4354b43297ac11bd2d471

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page