A tool to convert CSVs to Parquet files
Project description
csv2parquet
Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable useful.
Usage
First, install the pipenv environment:
pipenv install
Next, create some Parquet files. The tool supports CSV and TSV files.
./csv2parquet file.csv [--row-group-size NNN] [--output output.parquet] [--codec CODEC]
where CODEC
is one of snappy
, gzip
, brotli
or none
csv2tsv
Sorting your files can improve compression and query time. Since CSVs are a pain to manipulate, I've included a tool to convert them to TSVs, which can be more easily manipulated by standard tools.
./csv2tsv file.csv > file.tsv
csv2tsv
can also sort, for example, to sort a CSV on its 4th column (but leave
the header row at the top):
./csv2tsv file.csv -k4 > file.tsv
Under the covers, this delegates to the Unix sort
command. See man sort
for other options you can pass.
Note that sorting by multiple columns (say, the 8th, then the 4th) has an unintuitive syntax:
./csv2tsv file.csv -k8,8 -k4,4 > file.tsv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for csv2parquet-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d3c4e56b0d5b37edf1372fe67963f4f8616ba7eb7c7bab0e843ed187af988401 |
|
MD5 | 09ba5f11d3e91762fefcc2b4b5a68a0c |
|
BLAKE2b-256 | e9fd5bf9b7922eb5aebc964182d80f6d14aabc27714d42dcde7d3df5553193c4 |