Skip to main content

A command-line tool for converting between Parquet and CSV file formats

Project description

ParquetConv

A command-line tool for converting between Parquet and CSV file formats using pandas.

Features

  • Automatic format detection: Automatically detects whether the input file is Parquet or CSV
  • Bidirectional conversion: Convert Parquet to CSV or CSV to Parquet
  • Flexible output naming: Auto-generates output filenames or allows custom naming
  • Error handling: Comprehensive error handling with informative messages
  • Force conversion: Option to force conversion even with uncertain file formats

Installation

Option 1: Install from PyPI (Recommended)

pip install parquetconv

After installation, you can use the parquetconv command directly from anywhere in your terminal.

Option 2: Install from source

Clone the repository and install:

git clone https://github.com/ToyokoLabs/parquetconv.git
cd parquetconv
pip install -e .

Option 3: Development setup with uv

The project uses uv for dependency management. Install dependencies with:

uv sync

Usage

After pip installation

Convert a Parquet file to CSV:

parquetconv input.parquet

Convert a CSV file to Parquet:

parquetconv input.csv

From source or development

python -m parquetconv.cli input.parquet
python -m parquetconv.cli input.csv

Advanced Usage

Specify a custom output filename:

parquetconv input.parquet -o custom_output.csv
parquetconv input.csv -o custom_output.parquet

Force conversion (useful when file format detection is uncertain):

parquetconv input_file --force

Command Line Options

  • input_file: Path to the input file (required)
  • -o, --output: Custom output file path (optional)
  • --force: Force conversion even if file format detection is uncertain
  • -h, --help: Show help message

Examples

# Convert Parquet to CSV with auto-generated filename
parquetconv data.parquet
# Output: data.csv

# Convert CSV to Parquet with custom filename
parquetconv data.csv -o processed_data.parquet

# Convert with force flag
parquetconv unknown_file --force

# Get help
parquetconv --help

Requirements

  • Python 3.9+
  • pandas >= 2.3.2
  • pyarrow >= 21.0.0

How It Works

  1. File Detection: The tool first checks the file extension, then attempts to read the file to determine its format
  2. Format Conversion: Uses pandas to read the input file and convert it to the opposite format
  3. Output Generation: Creates the output file with an appropriate extension if not specified

Error Handling

The tool provides clear error messages for:

  • Missing input files
  • Unsupported file formats
  • Read/write errors during conversion
  • Invalid file content

Development

To contribute to the project:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Run tests (if available)
  5. Submit a pull request

License

This project is open source and available under the GNU General Public License v3.0.

Author

Sebastian Bassi - sebastian@toyoko.io

Repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquetconv-0.2.1.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parquetconv-0.2.1-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file parquetconv-0.2.1.tar.gz.

File metadata

  • Download URL: parquetconv-0.2.1.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.3

File hashes

Hashes for parquetconv-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b48f03ff42de9636949f2d6552c79f7fdba1c08959d526ec6ec6b0738c549b6d
MD5 0fadc36c57dbfe0c98f80e371e34b39c
BLAKE2b-256 226444b2a1e1803dd93420928db6234927af9907d6389b8d07c322b6c1913d71

See more details on using hashes here.

File details

Details for the file parquetconv-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for parquetconv-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 123b4e05ab2956ed77a919a64045227b82bbba11e622f0cc590c4735c72456f2
MD5 5c37d30c2a3f9babb0aa2b3ed22c5fc3
BLAKE2b-256 b1e324a2273a85d44ba57342db7ad079635043d541982d416ffdc71570a392bd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page