Skip to main content

A tool to convert CSVs to Parquet files

Project description

csv2parquet

Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable useful.

Usage

First, install the pipenv environment:

pipenv install

Next, create some Parquet files. The tool supports CSV and TSV files.

./csv2parquet file.csv [--row-group-size NNN] [--output output.parquet] [--codec CODEC]

where CODEC is one of snappy, gzip, brotli or none

csv2tsv

Sorting your files can improve compression and query time. Since CSVs are a pain to manipulate, I've included a tool to convert them to TSVs, which can be more easily manipulated by standard tools.

./csv2tsv file.csv > file.tsv

csv2tsv can also sort, for example, to sort a CSV on its 4th column (but leave the header row at the top):

./csv2tsv file.csv -k4 > file.tsv

Under the covers, this delegates to the Unix sort command. See man sort for other options you can pass. Note that sorting by multiple columns (say, the 8th, then the 4th) has an unintuitive syntax:

./csv2tsv file.csv -k8,8 -k4,4 > file.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv2parquet-0.0.1.tar.gz (2.7 kB view hashes)

Uploaded Source

Built Distribution

csv2parquet-0.0.1-py3-none-any.whl (3.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page