A command-line tool for converting between Parquet and CSV file formats
Project description
ParquetConv
A command-line tool for converting between Parquet and CSV file formats using pandas.
Features
- Automatic format detection: Automatically detects whether the input file is Parquet or CSV
- Bidirectional conversion: Convert Parquet to CSV or CSV to Parquet
- Flexible output naming: Auto-generates output filenames or allows custom naming
- Error handling: Comprehensive error handling with informative messages
- Force conversion: Option to force conversion even with uncertain file formats
Installation
Option 1: Install from PyPI (Recommended)
pip install parquetconv
After installation, you can use the parquetconv command directly from anywhere in your terminal.
Option 2: Install from source
Clone the repository and install:
git clone https://github.com/ToyokoLabs/parquetconv.git
cd parquetconv
pip install -e .
Option 3: Development setup with uv
The project uses uv for dependency management. Install dependencies with:
uv sync
Usage
After pip installation
Convert a Parquet file to CSV:
parquetconv input.parquet
Convert a CSV file to Parquet:
parquetconv input.csv
From source or development
python -m parquetconv.cli input.parquet
python -m parquetconv.cli input.csv
Advanced Usage
Specify a custom output filename:
parquetconv input.parquet -o custom_output.csv
parquetconv input.csv -o custom_output.parquet
Force conversion (useful when file format detection is uncertain):
parquetconv input_file --force
Command Line Options
input_file: Path to the input file (required)-o, --output: Custom output file path (optional)--force: Force conversion even if file format detection is uncertain-h, --help: Show help message
Examples
# Convert Parquet to CSV with auto-generated filename
parquetconv data.parquet
# Output: data.csv
# Convert CSV to Parquet with custom filename
parquetconv data.csv -o processed_data.parquet
# Convert with force flag
parquetconv unknown_file --force
# Get help
parquetconv --help
Requirements
- Python 3.9+
- pandas >= 2.3.2
- pyarrow >= 21.0.0
How It Works
- File Detection: The tool first checks the file extension, then attempts to read the file to determine its format
- Format Conversion: Uses pandas to read the input file and convert it to the opposite format
- Output Generation: Creates the output file with an appropriate extension if not specified
Error Handling
The tool provides clear error messages for:
- Missing input files
- Unsupported file formats
- Read/write errors during conversion
- Invalid file content
Development
To contribute to the project:
- Fork the repository
- Create a feature branch
- Make your changes
- Run tests (if available)
- Submit a pull request
License
This project is open source and available under the GNU General Public License v3.0.
Author
Sebastian Bassi - sebastian@toyoko.io
Repository
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parquetconv-0.2.1.tar.gz.
File metadata
- Download URL: parquetconv-0.2.1.tar.gz
- Upload date:
- Size: 48.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b48f03ff42de9636949f2d6552c79f7fdba1c08959d526ec6ec6b0738c549b6d
|
|
| MD5 |
0fadc36c57dbfe0c98f80e371e34b39c
|
|
| BLAKE2b-256 |
226444b2a1e1803dd93420928db6234927af9907d6389b8d07c322b6c1913d71
|
File details
Details for the file parquetconv-0.2.1-py3-none-any.whl.
File metadata
- Download URL: parquetconv-0.2.1-py3-none-any.whl
- Upload date:
- Size: 16.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
123b4e05ab2956ed77a919a64045227b82bbba11e622f0cc590c4735c72456f2
|
|
| MD5 |
5c37d30c2a3f9babb0aa2b3ed22c5fc3
|
|
| BLAKE2b-256 |
b1e324a2273a85d44ba57342db7ad079635043d541982d416ffdc71570a392bd
|