This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

parquet-python

parquet-python is a pure-python implementation (currently with only read-support) of the parquet format. It comes with a script for reading parquet files and outputting the data to stdout as JSON or TSV (without the overhead of JVM startup). Performance has not yet been optimized, but it’s useful for debugging and quick viewing of data in files.

Not all parts of the parquet-format have been implemented yet or tested e.g. nested data—see Todos below for a full list. With that said, parquet-python is capable of reading all the data files from the parquet-compatability project.

requirements

parquet-python has been tested on python 2.7, 3.4, and 3.5. It depends on thrift (0.9) and python-snappy (for snappy compressed files).

getting started

parquet-python is available via PyPi and can be installed using pip install parquet. The package includes the parquet command for reading python files, e.g. parquet test.parquet. See parquet –help for full usage.

Example

parquet-python currently has two programatic interfaces with similar functionality to Python’s csv reader. First, it supports a DictReader which returns a dictionary per row. Second, it has a reader which returns a list of values for each row. Both function require a file-like object and support an optional columns field to only read the specified columns.

import parquet
import json

## assuming parquet file with two rows and three columns:
## foo bar baz
## 1   2   3
## 4   5   6

with open("test.parquet") as fo:
   # prints:
   # {"foo": 1, "bar": 2}
   # {"foo": 4, "bar": 5}
   for row in parquet.DictReader(fo, columns=['foo', 'bar']):
       print(json.dumps(row))


with open("test.parquet") as fo:
   # prints:
   # 1,2
   # 4,5
   for row in parquet.reader(fo, columns=['foo', 'bar]):
       print(",".join([str(r) for r in row]))

Todos

  • Support the deprecated bitpacking
  • Fix handling of repetition-levels and definition-levels
  • Tests for nested schemas, null data
  • Support reading of data from HDFS via snakebite and/or webhdfs.
  • Implement writing
  • performance evaluation and optimization (i.e. how does it compare to the c++, java implementations)

Contributing

Is done via Pull Requests. Please include tests with your changes and follow pep8.

Release History

Release History

1.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

1.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.0.0

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
parquet-1.1-py2-none-any.whl (18.3 kB) Copy SHA256 Checksum SHA256 py2 Wheel Aug 16, 2016
parquet-1.1-py3-none-any.whl (18.3 kB) Copy SHA256 Checksum SHA256 py3 Wheel Aug 16, 2016
parquet-1.1.tar.gz (18.2 kB) Copy SHA256 Checksum SHA256 Source Aug 16, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting