Skip to main content

A lingua franca utility for converting between data formats (JSON, CSV) and Parquet.

Project description

parquet-lf

A lingua franca utility for converting between data formats (NDJSON, CSV) and Parquet.

Installation

uv tool install parquet-lf

Usage

Convert to Parquet

# Convert CSV to Parquet
parquet-lf to-parquet csv input.csv -o output.parquet

# Convert NDJSON to Parquet
parquet-lf to-parquet ndjson input.ndjson -o output.parquet

# jsonl is an alias for ndjson
parquet-lf to-parquet jsonl input.jsonl -o output.parquet

Convert from Parquet

# Convert Parquet to CSV
parquet-lf from-parquet csv input.parquet -o output.csv

# Convert Parquet to NDJSON
parquet-lf from-parquet ndjson input.parquet -o output.ndjson

# jsonl is an alias for ndjson
parquet-lf from-parquet jsonl input.parquet -o output.jsonl

Output to stdout

When the -o/--output flag is omitted, output is written to stdout:

# Output CSV to stdout
parquet-lf from-parquet csv input.parquet

# Pipe to another command
parquet-lf from-parquet csv input.parquet | head -10

# Output Parquet to stdout (binary) and redirect to file
parquet-lf to-parquet csv input.csv > output.parquet

Note: Logs are written to stderr, so they won't interfere with piped data.

Inspect Files

Use the info command to view file metadata and schema without loading the entire dataset:

# Show file info (schema, row count, size)
parquet-lf info examples/sample.parquet

# Show file info with preview of first N rows
parquet-lf info --head 5 examples/sample.parquet
parquet-lf info -n 5 examples/sample.csv

The info command supports all formats (Parquet, CSV, NDJSON) and auto-detects the format from the file extension.

Help

parquet-lf --help
parquet-lf to-parquet --help
parquet-lf from-parquet --help
parquet-lf info --help

Supported Formats

NDJSON (Newline Delimited JSON)

NDJSON is a format where each line is a valid JSON object. It's a true tabular peer to CSV, making it ideal for data interchange.

Example NDJSON file:

{"name": "alice", "value": 10}
{"name": "bob", "value": 20}
{"name": "charlie", "value": 30}

Both ndjson and jsonl commands are supported as synonyms.

CSV

Standard comma-separated values format with a header row.

Example CSV file:

name,value
alice,10
bob,20
charlie,30

Example Files

The examples/ directory contains sample data files for experimenting with the CLI:

  • examples/sample.parquet - Parquet format
  • examples/sample.csv - CSV format
  • examples/sample.ndjson - NDJSON format

These files contain the same 5-row dataset with columns: id, name, age, city, score.

Development

See CONTRIBUTING.md for development setup and guidelines.

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parquet_lf-1.0.0.tar.gz (7.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parquet_lf-1.0.0-py3-none-any.whl (12.7 kB view details)

Uploaded Python 3

File details

Details for the file parquet_lf-1.0.0.tar.gz.

File metadata

  • Download URL: parquet_lf-1.0.0.tar.gz
  • Upload date:
  • Size: 7.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for parquet_lf-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a54c312b17aa499323d253454956d8bcf0bfcdc6a2919c3d91e310807b66e34d
MD5 e777a2118a095655ede92eba3bf78ed7
BLAKE2b-256 47832b4b73a6a5b40afe76a6b0479d0590a2ec0136f415edb0ed5af67a6f4187

See more details on using hashes here.

File details

Details for the file parquet_lf-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: parquet_lf-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 12.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for parquet_lf-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e867d03181a88ff4ce3e354d6cfc3ac66e1d64ba64dc5f2a90ef767670f584de
MD5 5b47f8084daf66a7cb88c2771348b2f2
BLAKE2b-256 16b4694b13a996d5143b2652614b732ae58749cb528b5415784575811a0c253d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page