A tool to convert CSVs to Parquet files
Project description
csv2parquet
Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable useful.
Installing
If you just want to use the tool:
sudo pip install pyarrow csv2parquet
If you want to clone the repo and work on the tool, install its dependencies via pipenv:
pipenv install
Usage
Next, create some Parquet files. The tool supports CSV and TSV files.
usage: csv2parquet [-h] [-n ROWS] [-r ROW_GROUP_SIZE] [-o OUTPUT] [-c CODEC]
[-i INCLUDE [INCLUDE ...] | -x EXCLUDE [EXCLUDE ...]]
[-R RENAME [RENAME ...]] [-t TYPE [TYPE ...]]
csv_file
positional arguments:
csv_file input file, can be CSV or TSV
optional arguments:
-h, --help show this help message and exit
-n ROWS, --rows ROWS The number of rows to include, useful for testing.
-r ROW_GROUP_SIZE, --row-group-size ROW_GROUP_SIZE
The number of rows per row group.
-o OUTPUT, --output OUTPUT
The parquet file
-c CODEC, --codec CODEC
The compression codec to use (brotli, gzip, snappy,
none)
-i INCLUDE [INCLUDE ...], --include INCLUDE [INCLUDE ...]
Include the given columns (by index or name)
-x EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
Exclude the given columns (by index or name)
-R RENAME [RENAME ...], --rename RENAME [RENAME ...]
Rename a column. Specify the column to be renamed and
its new name, eg: 0=age or person_age=age
-t TYPE [TYPE ...], --type TYPE [TYPE ...]
Parse a column as a given type. Specify the column and
its type, eg: 0=bool? or person_age=int8. Parse errors
are fatal unless the type is followed by a question
mark. Valid types are string (default), bool, int8,
int16, int32, int64, float32, float64, timestamp
Testing
pylint csv2parquet
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for csv2parquet-0.0.5.post2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 095105ce1ef10f36522ded595632afcc9189486c9d43a0c06f1e35c9d971bbfa |
|
MD5 | 6cb6b4327d7315885d7e8761303d7cf5 |
|
BLAKE2b-256 | 967242a37399f4c121dd4f46af0acb1b00792c8ae66b111f86a4727befed6698 |