Skip to main content

A tool to convert CSVs to Parquet files

Project description

csv2parquet

Build Status codecov

Convert a CSV to a parquet file. You may also find sqlite-parquet-vtable or parquet-metadata useful.

Installing

If you just want to use the tool:

sudo pip install pyarrow csv2parquet

If you want to clone the repo and work on the tool, install its dependencies via pipenv:

pipenv install

Usage

Next, create some Parquet files. The tool supports CSV and TSV files.

usage: csv2parquet [-h] [-n ROWS] [-r ROW_GROUP_SIZE] [-o OUTPUT] [-c CODEC]
                   [-i INCLUDE [INCLUDE ...] | -x EXCLUDE [EXCLUDE ...]]
                   [-R RENAME [RENAME ...]] [-t TYPE [TYPE ...]]
                   csv_file

positional arguments:
  csv_file              input file, can be CSV or TSV

optional arguments:
  -h, --help            show this help message and exit
  -n ROWS, --rows ROWS  The number of rows to include, useful for testing.
  -r ROW_GROUP_SIZE, --row-group-size ROW_GROUP_SIZE
                        The number of rows per row group.
  -o OUTPUT, --output OUTPUT
                        The parquet file
  -c CODEC, --codec CODEC
                        The compression codec to use (brotli, gzip, snappy,
                        zstd, none)
  -i INCLUDE [INCLUDE ...], --include INCLUDE [INCLUDE ...]
                        Include the given columns (by index or name)
  -x EXCLUDE [EXCLUDE ...], --exclude EXCLUDE [EXCLUDE ...]
                        Exclude the given columns (by index or name)
  -R RENAME [RENAME ...], --rename RENAME [RENAME ...]
                        Rename a column. Specify the column to be renamed and
                        its new name, eg: 0=age or person_age=age
  -t TYPE [TYPE ...], --type TYPE [TYPE ...]
                        Parse a column as a given type. Specify the column and
                        its type, eg: 0=bool? or person_age=int8. Parse errors
                        are fatal unless the type is followed by a question
                        mark. Valid types are string (default), base64, bool,
                        float32, float64, int8, int16, int32, int64, timestamp

Testing

pylint csv2parquet
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv2parquet-0.0.9.tar.gz (6.4 kB view details)

Uploaded Source

Built Distribution

csv2parquet-0.0.9-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file csv2parquet-0.0.9.tar.gz.

File metadata

  • Download URL: csv2parquet-0.0.9.tar.gz
  • Upload date:
  • Size: 6.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2

File hashes

Hashes for csv2parquet-0.0.9.tar.gz
Algorithm Hash digest
SHA256 eb80cf4aff460636e8f1e62bbc105e047bfde98c504f24f2f4e6c47aa0654472
MD5 baf96efebd93b084e59083c3c8e3a89e
BLAKE2b-256 e08383ea50f6bc97eefe782c55af4212c2028ead40bfba6a9e2b7bc1a7a2e0b5

See more details on using hashes here.

File details

Details for the file csv2parquet-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: csv2parquet-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.5.2

File hashes

Hashes for csv2parquet-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 d090517e13561f3d809505c8b5fd00795bae1373a1677950e368c447a9660690
MD5 f315f637ac0e2dfebbc8f3418592a258
BLAKE2b-256 5dc43af2eb761f96d6b5d0a5da4d87b9e1bda2afdf1068ef74f3c65bb3611697

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page