A proposed standard `NOCK` for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologies.

These details have not been verified by PyPI

Project links

Homepage

Project description

pynock

Licence Repo size GitHub commit activity downloads sponsor

The following describes a proposed standard NOCK for a Parquet format that supports efficient distributed serialization of multiple kinds of graph technologies.

This library pynock provides Examples for working with low-level Parquet read/write efficiently in Python.

Our intent is to serialize graphs in a way which aligns the data representations required for popular graph technologies and related data sources:

semantic graphs (e.g., W3C formats RDF, TTL, JSON-LD, etc.)
labeled property graphs (e.g., openCypher)
probabilistic graphs (e.g., PSL)
spreadsheet import/export (e.g., CSV)
dataframes (e.g., Pandas, Dask, Spark, etc.)
edge lists (e.g., NetworkX, cuGraph, etc.)

This approach also efficient distributed partitions based on Parquet, which can scale on a cluster to very large (+1 T node) graphs.

For details about the proposed format in Parquet files, see the FORMAT.md file.

If you have questions, suggestions, or bug reports, please open an issue on our public GitHub repo.

Caveats

Note that the pynock library does not provide any support for graph computation or querying, merely for manipulating and validating serialization formats.

Our intent is to provide examples where others from the broader open source developer community can help troubleshoot edge cases in Parquet.

Dependencies

This code has been tested and validated using Python 3.8, and we make no guarantees regarding correct behaviors on other versions.

The Parquet file formats depend on Arrow 5.0.x or later.

For the Python dependencies, the library versioning info is listed in the requirements.txt file.

Set up

To install via PIP:

python3 -m pip install -U pynock

To set up this library locally:

python3 -m venv venv
source venv/bin/activate

python3 -m pip install -U pip wheel
python3 -m pip install -r requirements.txt

Usage via CLI

To run examples from CLI:

python3 cli.py load-parq --file dat/recipes.parq --debug

python3 cli.py load-rdf --file dat/tiny.ttl --save-csv foo.csv

For further information:

python3 cli.py --help

Usage programmatically in Python

To construct a partition file programmatically, see the examples for Jupyter notebooks with sample code and debugging.

Background

For more details about using Arrow and Parquet see:

"Apache Arrow homepage"

"Finer-grained Reading and Writing"

"Apache Arrow: Read DataFrame With Zero Memory"
Dejan Simic
Towards Data Science (2020-06-25)

Why the name?

A nock is the English word for the end of an arrow opposite its point.

If you must have an acronym, the proposed standard NOCK stands for Network Objects for Consistent Knowledge.

Also, the library name had minimal namespace collisions on GitHub and PyPi :)

Developer updates

To set up the build environment locally, also run:

python3 -m pip install -U pip setuptools wheel
python3 -m pip install -r requirements-dev.txt

Note that we require the use of pre-commit hooks and to configure that locally:

pre-commit install
git config --local core.hooksPath .git/hooks/

Package releases

First, verify that setup.py will run correctly for the package release process:

python3 -m pip install -e .
python3 -m pytest -rx tests/
python3 -m pip uninstall pynock

Next, update the semantic version number in setup.py and create a release on GitHub, and make sure to update the local repo:

git stash
git checkout main
git pull

Make sure that you have set up your 2FA authentication for generating an API token on PyPi: https://pypi.org/manage/account/token/

Then run our PyPi push script:

./bin/push_pypi.sh

Star History

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.2.1

Oct 11, 2022

1.2.0

Oct 7, 2022

1.1.1

Oct 6, 2022

1.0.1

Oct 2, 2022

1.0.0

Oct 2, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pynock-1.2.1.tar.gz (50.0 kB view details)

Uploaded Oct 11, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pynock-1.2.1-py3-none-any.whl (9.4 kB view details)

Uploaded Oct 11, 2022 Python 3

File details

Details for the file pynock-1.2.1.tar.gz.

File metadata

Download URL: pynock-1.2.1.tar.gz
Upload date: Oct 11, 2022
Size: 50.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for pynock-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`a565624c9c58c86cbfc222d5af9d1ab8bf0448e86d26fb57633aa4cda2cae420`
MD5	`b89864056d1da82f96be992ab9ca10c0`
BLAKE2b-256	`74586f7740085f91b076fddc46433f3f4c2be5824672ac341e78b6b639bdb17e`

See more details on using hashes here.

File details

Details for the file pynock-1.2.1-py3-none-any.whl.

File metadata

Download URL: pynock-1.2.1-py3-none-any.whl
Upload date: Oct 11, 2022
Size: 9.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.8.10

File hashes

Hashes for pynock-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5c3978771b198245c1fc543380ff27695e339bd4a9b15609d7a8c6d6c304222f`
MD5	`5c29051a974f11f7fbe0f8904868c928`
BLAKE2b-256	`15886ceb04b83d063e24c9fdc82fbdcbe90d9aa38e67a4cddbf78b7e0f4906d5`

See more details on using hashes here.

pynock 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pynock

Caveats

Dependencies

Set up

Usage via CLI

Usage programmatically in Python

Background

Why the name?

Developer updates

Package releases

Star History

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes