Skip to main content

Python support for Parquet file format

Project description

https://github.com/dask/fastparquet/actions/workflows/main.yaml/badge.svg https://readthedocs.org/projects/fastparquet/badge/?version=latest

fastparquet is a python implementation of the parquet format, aiming integrate into python-based big data work-flows. It is used implicitly by the projects Dask, Pandas and intake-parquet.

We offer a high degree of support for the features of the parquet format, and very competitive performance, in a small install size and codebase.

Details of this project, how to use it and comparisons to other work can be found in the documentation.

Requirements

(all development is against recent versions in the default anaconda channels and/or conda-forge)

Required:

  • numpy

  • pandas

  • cython >= 0.29.23 (if building from pyx files)

  • cramjam

  • fsspec

Supported compression algorithms:

  • Available by default:

    • gzip

    • snappy

    • brotli

    • lz4

    • zstandard

  • Optionally supported

Installation

Install using conda, to get the latest compiled version:

conda install -c conda-forge fastparquet

or install from PyPI:

pip install fastparquet

You may wish to install numpy first, to help pip’s resolver. This may install an appropriate wheel, or compile from source. For the latter, you will need a suitable C compiler toolchain on your system.

You can also install latest version from github:

pip install git+https://github.com/dask/fastparquet

in which case you should also have cython to be able to rebuild the C files.

Usage

Please refer to the documentation.

Reading

from fastparquet import ParquetFile
pf = ParquetFile('myfile.parq')
df = pf.to_pandas()
df2 = pf.to_pandas(['col1', 'col2'], categories=['col1'])

You may specify which columns to load, which of those to keep as categoricals (if the data uses dictionary encoding). The file-path can be a single file, a metadata file pointing to other data files, or a directory (tree) containing data files. The latter is what is typically output by hive/spark.

Writing

from fastparquet import write
write('outfile.parq', df)
write('outfile2.parq', df, row_group_offsets=[0, 10000, 20000],
      compression='GZIP', file_scheme='hive')

The default is to produce a single output file with a single row-group (i.e., logical segment) and no compression. At the moment, only simple data-types and plain encoding are supported, so expect performance to be similar to numpy.savez.

History

This project forked in October 2016 from parquet-python, which was not designed for vectorised loading of big data or parallel access.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fastparquet-2022.11.0.tar.gz (399.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fastparquet-2022.11.0-cp310-cp310-win_amd64.whl (615.9 kB view details)

Uploaded CPython 3.10Windows x86-64

fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ x86-64

fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_i686.whl (1.5 MB view details)

Uploaded CPython 3.10musllinux: musl 1.1+ i686

fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ ARM64

fastparquet-2022.11.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

fastparquet-2022.11.0-cp310-cp310-macosx_11_0_arm64.whl (580.1 kB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

fastparquet-2022.11.0-cp310-cp310-macosx_10_9_universal2.whl (787.6 kB view details)

Uploaded CPython 3.10macOS 10.9+ universal2 (ARM64, x86-64)

fastparquet-2022.11.0-cp39-cp39-win_amd64.whl (618.1 kB view details)

Uploaded CPython 3.9Windows x86-64

fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9musllinux: musl 1.1+ x86-64

fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_i686.whl (1.5 MB view details)

Uploaded CPython 3.9musllinux: musl 1.1+ i686

fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ ARM64

fastparquet-2022.11.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.5 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

fastparquet-2022.11.0-cp39-cp39-macosx_11_0_arm64.whl (580.5 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

fastparquet-2022.11.0-cp39-cp39-macosx_10_9_universal2.whl (789.7 kB view details)

Uploaded CPython 3.9macOS 10.9+ universal2 (ARM64, x86-64)

fastparquet-2022.11.0-cp38-cp38-win_amd64.whl (627.6 kB view details)

Uploaded CPython 3.8Windows x86-64

fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8musllinux: musl 1.1+ x86-64

fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_i686.whl (1.6 MB view details)

Uploaded CPython 3.8musllinux: musl 1.1+ i686

fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ x86-64

fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ ARM64

fastparquet-2022.11.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl (1.5 MB view details)

Uploaded CPython 3.8manylinux: glibc 2.17+ i686manylinux: glibc 2.5+ i686

fastparquet-2022.11.0-cp38-cp38-macosx_11_0_arm64.whl (585.8 kB view details)

Uploaded CPython 3.8macOS 11.0+ ARM64

fastparquet-2022.11.0-cp38-cp38-macosx_10_9_universal2.whl (790.8 kB view details)

Uploaded CPython 3.8macOS 10.9+ universal2 (ARM64, x86-64)

File details

Details for the file fastparquet-2022.11.0.tar.gz.

File metadata

  • Download URL: fastparquet-2022.11.0.tar.gz
  • Upload date:
  • Size: 399.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for fastparquet-2022.11.0.tar.gz
Algorithm Hash digest
SHA256 cb5dfd69fc6ae0f47f0e2d6d1e8c339ebe73f52212e60a1d59f121e3972e2554
MD5 7639e9dcba8aa943b65e5e994b677b46
BLAKE2b-256 e5a7334deea342d4b9a0a31b426db6d07625802b74da059ca55a1626740b5b77

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 f2a147f9c1baf346abb7c643a44da9dae6d3f0d83f15014bcdb7c564b285b67d
MD5 0136119224f88a8d7828c1c1dfd63eb7
BLAKE2b-256 12869767db07c1eca6c15673af91bfe892335181b720d0f87ffc3e689ecb6b2a

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 0548409c597566aebb20360b1c3a46f1af05f7b13553996e53d017b2c20bf299
MD5 a6f14dfae9b44d8999c9dd7be1f28eec
BLAKE2b-256 27264b06ad94fe24f2404078789433f8e02f4f14a3c8051abe04d11e16904689

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 1a5c23b1373193f67f2e3007de836d3508e175c377ec75f31790814ac38e58a8
MD5 bcba70dd8e4a5a4c8d4e4c5a4df73dd0
BLAKE2b-256 76d897679c1dbf8fee6f9818d370f3dc7c6c3e9a73208f1f637352771338e7d0

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1bd7a9585e692b7e29e4638d13baa0658cbf877d2e306121601748bd7a1219cb
MD5 cf9f4dca172075e6857e11b12ebce1a2
BLAKE2b-256 a673a59ae4de62752c75a352e137cbe1cb2e568941469b586e75a714f0355f5b

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7200eedbf4ee6582ec098da94ccb81b465aafcd03dc5cf934756ad31e0fa0658
MD5 bca6ac7f325c01c7a1043932e34ab707
BLAKE2b-256 cca25fa8b8ede18447d25094ce1f413e6fd3fb17a7becc4be113b4c3faf7ba3c

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 cd0fe5cff4294739c6166e73f6b4028489fa695538620c3fc4d20fb9360f3c2e
MD5 65c4b26ca404f270f2592cfc4191f221
BLAKE2b-256 a0edbdb94884436cc47287aa4ed9dbe6c0187cd085b7d3da29f5eed87688d8c9

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6a37ed8dd88a744fbaa6c77e01d67e0657971de8c73f6cc0c1599849d8478275
MD5 6e86098f5b056ca8d14c03065e8f450c
BLAKE2b-256 8ec38f6fb3555735fd2336dd6d53b8a6a8a3cd7dd08f13f03e05309be391ed33

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp310-cp310-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp310-cp310-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 3d66baabd82a0d0f4bf8daf05522cbc624785cb0e6e1740bdf6578f3a07e463f
MD5 fcd1f10f1ea3672df3c75be065fb42aa
BLAKE2b-256 0130d4eba1b79a44f6f76ddbdbdfaccba5845b5a24ed0a15b25454082940686b

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-win_amd64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 a50d489d3c3f2b558f967dc0b737ce0a1280b59b780cc6233f8ae571a3d9c76b
MD5 4c31ead4f57362ce21567665b3a06a0b
BLAKE2b-256 5712b3560bb2ef0ff5449eebf7ff2272d428698924f21cb5d808d84205654616

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 a9690d7c474f1b1a35ea6f9b43e3f368a5bbc9f80ea785a2bd5838598d8ff632
MD5 9cb6753c76dd7a7c9ba0a95c4413c55b
BLAKE2b-256 22fc065ff47221fe3b9539d9e614ad392d2967958248d23e0d65ebea3dd7ca7c

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 99cadf6a1acf5f2d561ebcfa07c4f281eba550143ff199f6c36565a09e00be2c
MD5 f3bf7c1488bd829fd56fc604174e2a73
BLAKE2b-256 2cb523eed5dc14fe3a288f1f8a00c567786939f8622fef19254fe1a87cf6fa8a

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5b175463f7da898f3c21508450a62136c34471b745879e327f642b30d21ce631
MD5 935f1632c72795ae8581f6729229d8a6
BLAKE2b-256 09d221cb20cc6df2ef22cea90bb1532bf8669cbd60b7799acefbbc8dd01272fa

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8e9613cb7b00faf41b9d9dc296d579f7ef1e57e0372209ab0ba47316c8c1a371
MD5 4e4d8f769ea50115654d6c163d597d22
BLAKE2b-256 725fedc7446d2341401d5505d50e18f27fe5651a1de70b633d3cb035cc51666d

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 a9349a8e89c20eeffd0446c081470e42e366a6fae267e25d6b54afd81ec63877
MD5 66890417568f8e7fcc53dd5257d6a875
BLAKE2b-256 393b0cbcf682cc54eef2b04232bfaebc1d375b39975c17b69355d5f21a1b3889

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a9d941c680571dd4e88f253c1301d04ee5e92a8fbdaaaac4e5781325e56d64f
MD5 44d6e2de41fcf046e2837222139ec995
BLAKE2b-256 2c69783b4bb491b53fee49f750f9d813501de2734b6e38e64ff0a9ecdc49ea17

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp39-cp39-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp39-cp39-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 68bb4c28e7ecf6c775fd6a788c87a2133941cfbe0bf6a4f8d534cb10ab56c61a
MD5 6a7e6e288e9ca2642e4d2779d7609be0
BLAKE2b-256 8586ef79e8a4107642684bf7d058c0f2df9abf9170174b3ee35c58fe57d0e909

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-win_amd64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 c1503ba6ae1e24f6f89b29cfce9f3671bfe311d237ca562f205d996e5ff856cd
MD5 0a24f84f2eda43b04c915f6ef839be8b
BLAKE2b-256 6a8d3f9e399c6089ae6c1bb6cc056a9e93ec62a839a4bf5195a55be125e72f0f

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 7ee4ce3e4e03884e1c6fce0cd186e650e20c64f935e44d3dbfd7f6eef9490c52
MD5 263a74174abba99257d44e1519d3968c
BLAKE2b-256 c6b1abcb6f8554a2f97bc2919447b9a6ad67c6ae23991ae4429dde517d5ff28c

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-musllinux_1_1_i686.whl
Algorithm Hash digest
SHA256 2a84f5caf0d0750d5f49150f26045e61d07e5813b70a4a59ae3777b5fdc92685
MD5 7f298b7c8ef6095cffede40ee431ee97
BLAKE2b-256 0d3cafc79c56d51e1e5930b62ca488fb832ad78aafe48e249aef5848a2e4f490

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 095e5e4ee059bd589ddcbd368f78e0cba219bb0ac6e02c473d7de1bbe74b8056
MD5 78caf61fa7c31f2ab67d3ad7f96de3c1
BLAKE2b-256 94f8a8821febb46d9b2204a926c6839e9da24213d5a5c0af33aaed1af86b543e

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 dcf72ed778f15a81c1601826e51a15ff68524923534386027c7388c803531ca3
MD5 f7bf0c183aba41305397434e50119db2
BLAKE2b-256 62cd3c2c719d3fddaad1fd5be2efe96f8820642f1f0977b11d03aa49a2e06273

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-manylinux_2_5_i686.manylinux1_i686.manylinux_2_17_i686.manylinux2014_i686.whl
Algorithm Hash digest
SHA256 42fd4674de54f5aca5c69de1d8c4e3f91a4af73410e1a01a57a7f2c745faade5
MD5 e03a097ccb6e9171b03460568fc54e02
BLAKE2b-256 f793cd8eda8ef3aa942fec429b308a25aef7c06f92f5200d672a97dfb7832aa3

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 208cc58966288d88ce70ac3b2c397da3b443ba683eed1c0f4b8bd68bcd6561fa
MD5 baf734f95cfbcd08dc67d6a7feb029ca
BLAKE2b-256 237f1bb50c490e168782284b4f9ca8208101e660927e8cb63ce375212d1c64fa

See more details on using hashes here.

File details

Details for the file fastparquet-2022.11.0-cp38-cp38-macosx_10_9_universal2.whl.

File metadata

File hashes

Hashes for fastparquet-2022.11.0-cp38-cp38-macosx_10_9_universal2.whl
Algorithm Hash digest
SHA256 3e229f4c3374dff271eced0ac41f479b2b7a418f98a95c212018e84981aa1eee
MD5 df1ae505a4763faa237703f54d3c2b84
BLAKE2b-256 a180968220bfffd167c3d1a20af4dfa56710dd8dbd7c264399bec3d453efaf14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page