Skip to main content

Python support for Parquet file format

Project description

https://travis-ci.org/jcrobak/parquet-python.svg?branch=master

fastparquet is a python implementation of the parquet format, aiming integrate into python-based big data work-flows.

Not all parts of the parquet-format have been implemented yet or tested e.g. see the Todos linked below. With that said, fastparquet is capable of reading all the data files from the parquet-compatability project.

Introduction

Details of this project can be found in the documentation.

The original plan listing expected features can be found in this issue. Please feel free to comment on that list as to missing items and priorities, or raise new issues with bugs or requests.

Requirements

(all development is against recent versions in the default anaconda channels)

Required:

  • numba

  • numpy

  • pandas

  • cython

Optional (compression algorithms; gzip is always available):

  • snappy (aka python-snappy)

  • lzo

  • brotli

  • lz4

  • zstandard

Installation

Install using conda:

conda install -c conda-forge fastparquet

install from pypi:

pip install fastparquet

or install latest version from github:

pip install git+https://github.com/dask/fastparquet

For the pip methods, numba must have been previously installed (using conda).

Usage

Reading

from fastparquet import ParquetFile
pf = ParquetFile('myfile.parq')
df = pf.to_pandas()
df2 = pf.to_pandas(['col1', 'col2'], categories=['col1'])

You may specify which columns to load, which of those to keep as categoricals (if the data uses dictionary encoding). The file-path can be a single file, a metadata file pointing to other data files, or a directory (tree) containing data files. The latter is what is typically output by hive/spark.

Writing

from fastparquet import write
write('outfile.parq', df)
write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
      compression='GZIP', file_scheme='hive')

The default is to produce a single output file with a single row-group (i.e., logical segment) and no compression. At the moment, only simple data-types and plain encoding are supported, so expect performance to be similar to numpy.savez.

History

Since early October 2016, this fork of parquet-python has been undergoing considerable redevelopment. The aim is to have a small and simple and performant library for reading and writing the parquet format from python.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastparquet-0.6.0.tar.gz (70.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastparquet-0.6.0-cp39-cp39-macosx_10_9_x86_64.whl (131.3 kB view details)

Uploaded CPython 3.9macOS 10.9+ x86-64

fastparquet-0.6.0-cp38-cp38-macosx_10_9_x86_64.whl (135.9 kB view details)

Uploaded CPython 3.8macOS 10.9+ x86-64

fastparquet-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl (136.5 kB view details)

Uploaded CPython 3.7mmacOS 10.9+ x86-64

fastparquet-0.6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (350.0 kB view details)

Uploaded CPython 3.6mmanylinux: glibc 2.5+ x86-64

File details

Details for the file fastparquet-0.6.0.tar.gz.

File metadata

  • Download URL: fastparquet-0.6.0.tar.gz
  • Upload date:
  • Size: 70.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for fastparquet-0.6.0.tar.gz
Algorithm Hash digest
SHA256 44d10ec58a404b88348e2619f41119996ddcc0f1d7ae738fe14b8241b4c264c4
MD5 da506dd67260f024356dbdb81f133e55
BLAKE2b-256 2699bc42cc692008f16758272598eb11fc0be192ed608c379a5aa3c957706267

See more details on using hashes here.

File details

Details for the file fastparquet-0.6.0-cp39-cp39-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: fastparquet-0.6.0-cp39-cp39-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 131.3 kB
  • Tags: CPython 3.9, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.9.2

File hashes

Hashes for fastparquet-0.6.0-cp39-cp39-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 e8a7878a5a5d35ae8becbebad8b081c10cf1e2b10c12b887ed2f4a68c86268be
MD5 894abaf2ec7e2734ad668c9aab16527d
BLAKE2b-256 3aa54064ed0670cca949f198f584eae91e54e8c716995ed71b8acfab3dda2ddd

See more details on using hashes here.

File details

Details for the file fastparquet-0.6.0-cp38-cp38-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: fastparquet-0.6.0-cp38-cp38-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 135.9 kB
  • Tags: CPython 3.8, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/3.10.0 pkginfo/1.7.0 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.8.8

File hashes

Hashes for fastparquet-0.6.0-cp38-cp38-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 3b56c38916d09df31247c0e0484a554c7ef7008c7327443a2e64f3897ccbd8b5
MD5 877138323871b8cc283b018539f5b10b
BLAKE2b-256 e3086b1fdd0f70073dddb1d7c127a851c8f3bd96f61c11593c06b995100a9a40

See more details on using hashes here.

File details

Details for the file fastparquet-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl.

File metadata

  • Download URL: fastparquet-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl
  • Upload date:
  • Size: 136.5 kB
  • Tags: CPython 3.7m, macOS 10.9+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.8

File hashes

Hashes for fastparquet-0.6.0-cp37-cp37m-macosx_10_9_x86_64.whl
Algorithm Hash digest
SHA256 52d28ea4dc0d7684fcb566a202dc6aba51e39ff065c854c0847d9551a4f95eb7
MD5 a68954517d8575fe9ce18b971359731b
BLAKE2b-256 45895204d115f398b6141d283cb70af0abc8c527f2d50b71a13630f26473511e

See more details on using hashes here.

File details

Details for the file fastparquet-0.6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-0.6.0-cp36-cp36m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 cc92ec57b70f2a68bcb1111585843df4fefb193bce440d1c33f3f071584e5749
MD5 b3fce72c308ab1b97ad83c3884f19817
BLAKE2b-256 f9ba024a03ec8f39480c8d40d74cefb03de7daffd3c33178a6714df736a707cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page