Skip to main content

Intake parquet plugin

Project description

# Intake-parquet

[![Build Status](https://travis-ci.org/ContinuumIO/intake-parquet.svg?branch=master)](https://travis-ci.org/ContinuumIO/intake-parquet) [![Documentation Status](https://readthedocs.org/projects/intake-parquet/badge/?version=latest)](http://intake-parquet.readthedocs.io/en/latest/?badge=latest)

[Intake data loader](https://github.com/ContinuumIO/intake/) interface to the parquet binary tabular data format.

Parquet is very popular in the big-data ecosystem, because it provides columnar and chunk-wise access to the data, with efficient encodings and compression. This makes the format particularly effective for streaming through large subsections of even larger data-sets, hence it’s common use with Hadoop and Spark.

Parquet data may be single files, directories of files, or nested directories, where the directory names are meaningful in the partitioning of the data.

### Features

The parquet plugin allows for:

  • efficient metadata parsing, so you know the data types and number of records without loading any data

  • random access of partitions

  • column and index selection, load only the data you need

  • passing of value-based filters, that you only load those partitions containing some valid data (NB: does not filter the values within a partition)

### Installation

The conda install instructions are:

` conda install -c conda-forge intake-parquet `

### Examples

See the notebook in the examples/ directory.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

intake-parquet-0.3.0.tar.gz (57.8 kB view details)

Uploaded Source

Built Distribution

intake_parquet-0.3.0-py3-none-any.whl (5.4 kB view details)

Uploaded Python 3

File details

Details for the file intake-parquet-0.3.0.tar.gz.

File metadata

  • Download URL: intake-parquet-0.3.0.tar.gz
  • Upload date:
  • Size: 57.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.9

File hashes

Hashes for intake-parquet-0.3.0.tar.gz
Algorithm Hash digest
SHA256 57e43795f53d5e44d375683f13532bae248005ff428d177613bc7607ce4a0f89
MD5 6dbb9ec2a0c96f1848b0ffb4faf75878
BLAKE2b-256 7d5540fcbccfdcc6658e122317850ead0dbe7e6627ca603375c46bc05e17a357

See more details on using hashes here.

File details

Details for the file intake_parquet-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for intake_parquet-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a04b81c8bba6ec54bdc0763f3a37f908b241920c3881cf853b6f3df2d1c0cfa
MD5 bf7032c1db9b392a2e8d7ec5356c7513
BLAKE2b-256 8b191b78637e586233f1f41a933c2cf5c94aa62640e3b5058105495eba98b679

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page