Skip to main content

A utility for converting Parquet files into CSV and vice versa.

Project description

ParquetConv

A command-line tool for converting between Parquet and CSV file formats using pandas.

Features

  • Automatic format detection: Automatically detects whether the input file is Parquet or CSV
  • Bidirectional conversion: Convert Parquet to CSV or CSV to Parquet
  • Flexible output naming: Auto-generates output filenames or allows custom naming
  • Error handling: Comprehensive error handling with informative messages
  • Force conversion: Option to force conversion even with uncertain file formats

Installation

The project uses uv for dependency management. Install dependencies with:

uv sync

Usage

Basic Usage

Convert a Parquet file to CSV:

python main.py input.parquet

Convert a CSV file to Parquet:

python main.py input.csv

Advanced Usage

Specify a custom output filename:

python main.py input.parquet -o custom_output.csv
python main.py input.csv -o custom_output.parquet

Force conversion (useful when file format detection is uncertain):

python main.py input_file --force

Command Line Options

  • input_file: Path to the input file (required)
  • -o, --output: Custom output file path (optional)
  • --force: Force conversion even if file format detection is uncertain
  • -h, --help: Show help message

Examples

# Convert Parquet to CSV with auto-generated filename
python main.py data.parquet
# Output: data.csv

# Convert CSV to Parquet with custom filename
python main.py data.csv -o processed_data.parquet

# Convert with force flag
python main.py unknown_file --force

Requirements

  • Python 3.9+
  • pandas >= 2.3.2
  • pyarrow >= 21.0.0

How It Works

  1. File Detection: The tool first checks the file extension, then attempts to read the file to determine its format
  2. Format Conversion: Uses pandas to read the input file and convert it to the opposite format
  3. Output Generation: Creates the output file with an appropriate extension if not specified

Error Handling

The tool provides clear error messages for:

  • Missing input files
  • Unsupported file formats
  • Read/write errors during conversion
  • Invalid file content

License

This project is open source and available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquetconv-0.1.0.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parquetconv-0.1.0-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file parquetconv-0.1.0.tar.gz.

File metadata

  • Download URL: parquetconv-0.1.0.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.3

File hashes

Hashes for parquetconv-0.1.0.tar.gz
Algorithm Hash digest
SHA256 96ceacf53c3454e6c457b17ffce3f876254daef9fce35180dd56526bce821f44
MD5 84e289404d6f065f14c2be43973e3db9
BLAKE2b-256 d2c43ae7c9f674d835505ad7041dd83dda9a0a2f955863d3245fa4307b007f24

See more details on using hashes here.

File details

Details for the file parquetconv-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for parquetconv-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7fce1470fb4c4e5a29fed2a1cde84270e470905c1e99b14e579658f507971d1e
MD5 4a3a66cf051a4e71af75ced052f740b6
BLAKE2b-256 42bc02083795e34622c311c325f6303c1099c128b073c91a24319259ccd01b06

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page