Skip to main content

Examples for low-level Parquet read/write in Python

Project description


This library pynock provides Examples for working with low-level Parquet read/write efficiently in Python.

Our intent is to serialize graphs which align the data representations required for multiple areas of popular graph technologies:

  • semantic graphs (e.g., W3C)
  • labeled property graphs (e.g., openCypher)
  • probabilistic graphs (e.g., PSL)
  • edge lists (e.g., NetworkX)

This approach also supports distributed partitions based on Parquet which can scale to very large (+1 T node) graphs.

For details about the formatting required in Parquet files, see the page.


Note that the pynock library does not provide any support for graph computation or querying, merely for manipulating and validating serialization formats.

Our intent is to provide examples where others from the broader open source developer community can help troubleshoot edge cases in Parquet.


This code has been tested and validated using Python 3.8, and we make no guarantees regarding correct behaviors on other versions.

The Parquet file formats depend on Arrow 5.0.x or later.

For the Python dependencies, see the requirements.txt file.

Set up

To install via PIP:

python3 -m pip install -U pynock

To set up this library locally:

python3 -m venv venv
source venv/bin/activate

python3 -m pip install -U pip wheel
python3 -m pip install -r requirements.txt

Usage via CLI

To run examples from CLI:

python3 load-parq --file dat/recipes.parq --debug
python3 load-rdf --file dat/tiny.ttl --save-cvs foo.cvs

For further information:

python3 --help

Usage programmatically in Python

To construct a partition file programmatically, see the sample code which builds the minimal recipe example as an RDF graph.


For more details about using Arrow and Parquet see:

"Apache Arrow homepage"

"Finer-grained Reading and Writing"

"Apache Arrow: Read DataFrame With Zero Memory"
Dejan Simic
Towards Data Science (2020-06-25)

Why the name?

A nock is the English word for the end of an arrow opposite its point.

Package Release

First, verify that will run correctly for the package release process:

python3 -m pip install -e .
python3 -m pytest tests/
python3 -m pip uninstall pynock

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynock-1.0.0.tar.gz (21.4 kB view hashes)

Uploaded source

Built Distribution

pynock-1.0.0-py3-none-any.whl (7.2 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page