A tool to convert CSVs to Parquet files
Project description
csv2parquet
Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable or parquet-metadata useful.
Installing
If you just want to use the tool:
sudo pip install pyarrow csv2parquet
If you want to clone the repo and work on the tool, install its dependencies via pipenv:
pipenv install
Usage
Next, create some Parquet files. The tool supports CSV and TSV files.
usage: csv2parquet [-h] [-n ROWS] [-r ROW_GROUP_SIZE] [-o OUTPUT] [-c CODEC]
[-i INCLUDE [INCLUDE ...] | -x EXCLUDE [EXCLUDE ...]]
[-R RENAME [RENAME ...]] [-t TYPE [TYPE ...]]
csv_file
positional arguments:
csv_file input file, can be CSV or TSV
optional arguments:
-h, --help show this help message and exit
-n ROWS, --rows ROWS The number of rows to include, useful for testing.
-r ROW_GROUP_SIZE, --row-group-size ROW_GROUP_SIZE
The number of rows per row group.
-o OUTPUT, --output OUTPUT
The parquet file
-c CODEC, --codec CODEC
The compression codec to use (brotli, gzip, snappy,
zstd, none)
-i INCLUDE [INCLUDE ...], --include INCLUDE [INCLUDE ...]
Include the given columns (by index or name)
-x EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
Exclude the given columns (by index or name)
-R RENAME [RENAME ...], --rename RENAME [RENAME ...]
Rename a column. Specify the column to be renamed and
its new name, eg: 0=age or person_age=age
-t TYPE [TYPE ...], --type TYPE [TYPE ...]
Parse a column as a given type. Specify the column and
its type, eg: 0=bool? or person_age=int8. Parse errors
are fatal unless the type is followed by a question
mark. Valid types are string (default), base64, bool,
float32, float64, int8, int16, int32, int64, timestamp
Testing
pylint csv2parquet
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csv2parquet-0.0.9.tar.gz.
File metadata
- Download URL: csv2parquet-0.0.9.tar.gz
- Upload date:
- Size: 6.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb80cf4aff460636e8f1e62bbc105e047bfde98c504f24f2f4e6c47aa0654472
|
|
| MD5 |
baf96efebd93b084e59083c3c8e3a89e
|
|
| BLAKE2b-256 |
e08383ea50f6bc97eefe782c55af4212c2028ead40bfba6a9e2b7bc1a7a2e0b5
|
File details
Details for the file csv2parquet-0.0.9-py3-none-any.whl.
File metadata
- Download URL: csv2parquet-0.0.9-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d090517e13561f3d809505c8b5fd00795bae1373a1677950e368c447a9660690
|
|
| MD5 |
f315f637ac0e2dfebbc8f3418592a258
|
|
| BLAKE2b-256 |
5dc43af2eb761f96d6b5d0a5da4d87b9e1bda2afdf1068ef74f3c65bb3611697
|