graphique

GraphQL service for arrow tables and parquet data sets.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

GraphQL service for arrow tables and parquet data sets. The schema for a query API is derived automatically.

Usage

% env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/graphql to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

% python3 -m graphique.schema ...

outputs the graphql schema for a parquet data set.

Configuration

Graphique uses Starlette's config: in environment variables or a .env file. Config variables are used as input to ParquetDataset.

COLUMNS = []: names of columns to read at startup; * indicates all
DEBUG = False: run service in debug mode, which includes timing
DICTIONARIES = None: names of columns to read as dictionaries
FILTERS = None: predicates for which rows to read
INDEX = []: names of columns which are represent a sorted composite index or partition keys
MMAP = False: use a memory map to read the files
PARQUET_PATH: path to the parquet directory or file

API

types

Table: an arrow Table; the primary interface.
Column: an arrow Column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, DateTime, Time, Duration, Binary, String, List, Struct. All columns have a values field for their list of scalars. Additional fields vary by type.
Row: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A single row field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

selection

slice: contiguous selection of rows
search: binary search if the table is sorted, i.e., provides an index
filter: select rows from predicate functions

projection

columns: provides a field for every Column in the schema
column: access a column of any type by name
row: provides a field for each scalar of a single row
apply: transform columns by applying a function

aggregation

group: group by given columns, transforming the others into list columns
partition: partition on adjacent values in given columns, transforming the others into list columns
aggregate: apply reduce functions to list columns
tables: return a list of tables by splitting on the scalars in list columns

ordering

sort: sort table by given columns
min: select rows with smallest values
max: select rows with largest values

Performance

Graphique relies on native pyarrow routines wherever possible. Otherwise it falls back to using NumPy, with zero-copy views. Graphique also has custom optimizations for grouping, dictionary-encoded arrays, and chunked arrays.

By default, datasets are read on-demand, with only the necessary columns selected. Additionally filter(query: ...) is optimized to filter rows while reading the dataset. Although graphique is a running service, parquet is performant at reading a subset of data. Optionally specify COLUMNS to read a subset of columns (or *) at startup, trading-off memory for latency.

Specifying an INDEX with COLUMNS indicates the table is sorted, and enables the binary search field. Specifying just INDEX is allowed but only recommended if it corresponds to the partition keys; search(...) is functionally equivalent to filter(query: ...) without COLUMNS.

Installation

% pip install graphique[server]

Dependencies

pyarrow >=6
strawberry-graphql[asgi] >=0.84.4
uvicorn (or other ASGI server)
pytz (optional timestamp support)

Tests

100% branch coverage.

% pytest [--cov]

Changes

0.6

Pyarrow >=6 required
Group by optimized and replaced unique field
Dictionary related optimizations
Null consistency with arrow count functions

0.5

Pyarrow >=5 required
Stricter validation of inputs
Columns can be cast to another arrow data type
Grouping uses large list arrays with 64-bit counts
Datasets are read on-demand or optionally at startup

0.4

Pyarrow >=4 required
sort updated to use new native routines
partition tables by adjacent values and differences
filter supports unknown column types using tagged union pattern
Groups replaced with Table.tables and Table.aggregate fields
Tagged unions used for filter, apply, and partition functions

0.3

Pyarrow >=3 required
any and all fields
String column split field

0.2

ListColumn and StructColumn types
Groups type with aggregate field
group and unique optimized
pyarrow >= 2 required
Statistical fields: mode, stddev, variance
is_in, min, and max optimized

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.6

May 1, 2024

1.5

Jan 25, 2024

1.4

Nov 5, 2023

1.3

Aug 26, 2023

1.2

May 7, 2023

1.1

Jan 29, 2023

1.0

Oct 29, 2022

0.9

Aug 5, 2022

0.8

May 8, 2022

0.7

Feb 5, 2022

This version

0.6

Oct 28, 2021

0.5

Aug 8, 2021

0.4

May 16, 2021

0.3

Feb 1, 2021

0.2

Nov 26, 2020

0.1

Sep 5, 2020

0.0

Feb 29, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphique-0.6.zip (139.5 kB view hashes)

Uploaded Oct 28, 2021 Source

Built Distributions

graphique-0.6-cp310-cp310-win_amd64.whl (96.2 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.10 Windows x86-64

graphique-0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (340.3 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.10 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.24+ x86-64

graphique-0.6-cp310-cp310-macosx_10_14_x86_64.whl (96.3 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.10 macOS 10.14+ x86-64

graphique-0.6-cp39-cp39-win_amd64.whl (96.2 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.9 Windows x86-64

graphique-0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (339.7 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.9 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.24+ x86-64

graphique-0.6-cp39-cp39-macosx_10_14_x86_64.whl (96.3 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.9 macOS 10.14+ x86-64

graphique-0.6-cp38-cp38-win_amd64.whl (96.3 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.8 Windows x86-64

graphique-0.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (350.8 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.8 manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.24+ x86-64

graphique-0.6-cp38-cp38-macosx_10_14_x86_64.whl (94.1 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.8 macOS 10.14+ x86-64

graphique-0.6-cp37-cp37m-win_amd64.whl (94.9 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.7m Windows x86-64

graphique-0.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (327.9 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.7m manylinux: glibc 2.17+ x86-64 manylinux: glibc 2.24+ x86-64

graphique-0.6-cp37-cp37m-macosx_10_14_x86_64.whl (94.6 kB view hashes)

Uploaded Oct 28, 2021 CPython 3.7m macOS 10.14+ x86-64

Hashes for graphique-0.6.zip

Hashes for graphique-0.6.zip
Algorithm	Hash digest
SHA256	`2aa0170f58d2c922a63de886db4aba394dc5cb006f45dd4f2c7d507c7fef21ec`
MD5	`31fe117b82ec1dddd0eec57e52ecd072`
BLAKE2b-256	`752959470998dbca8df01aeb34be9ff86c2c6c2de9f9d96f6583426848a4cd91`

Hashes for graphique-0.6-cp310-cp310-win_amd64.whl

Hashes for graphique-0.6-cp310-cp310-win_amd64.whl
Algorithm	Hash digest
SHA256	`d75d06b5505f61b843f42645725ab8954a9f0a3aeb5f7c6c8cf457a7685f6a8a`
MD5	`ce18cc6e7c50c700a32eaf8316a3ef60`
BLAKE2b-256	`3b230ed5091d5ddbb1aee8d96ebb8c632594ef699f2b89c6ca0e899faad9b2a7`

Hashes for graphique-0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Hashes for graphique-0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm	Hash digest
SHA256	`ac99206fb3d0d21900bbb83977224e98ab78cab399707fda221e1c7742630b3b`
MD5	`076d1e0feee9e01618008f969b1dd08b`
BLAKE2b-256	`3fc084951451f8a005f36b02876e51e65dc8d473bb9e11ae28a69274d6e36ff0`

Hashes for graphique-0.6-cp310-cp310-macosx_10_14_x86_64.whl

Hashes for graphique-0.6-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`83dcb12169dfbad1f8f2cabb5ceeacf2f480a1236836bb5b9e48cab629059132`
MD5	`17eaceb09d2bed89be86dd8fbbe79480`
BLAKE2b-256	`927a83d345af02641f99827a77668d9b017dfb58793149f99d2d5e30d45f39fa`

Hashes for graphique-0.6-cp39-cp39-win_amd64.whl

Hashes for graphique-0.6-cp39-cp39-win_amd64.whl
Algorithm	Hash digest
SHA256	`de6adc6fd2e9b3f630cb5320494297cf93ec6af6c55c9d919eaec64edd32adef`
MD5	`2790eeae1a7cf3b994f24c817a1b96cd`
BLAKE2b-256	`c4e2ca18156e756d488859af782cabdf92cab153f29a81acfeec5e87186c68af`

Hashes for graphique-0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Hashes for graphique-0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm	Hash digest
SHA256	`c85cb97c68849d50f70ee911d34d58c5db8a20acebeae68f8ff7d304b7b65fd4`
MD5	`5a39c685be8cf78f51415ce7e8efb203`
BLAKE2b-256	`bbd1c424c2d268baa4810cb57541d2243be26f7a345214de6b137554c49ef58f`

Hashes for graphique-0.6-cp39-cp39-macosx_10_14_x86_64.whl

Hashes for graphique-0.6-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`c11a4dd389c65ff24e4c38ff221fd134a0e177a3380424dbe49dfac51958e205`
MD5	`e7a233634652f634048ffd1860633bdf`
BLAKE2b-256	`f176b89cb3ee834dd017d602e547fbaff27c8f13406d9ea4edbd3674e01f50a3`

Hashes for graphique-0.6-cp38-cp38-win_amd64.whl

Hashes for graphique-0.6-cp38-cp38-win_amd64.whl
Algorithm	Hash digest
SHA256	`903c9c8bf179bbc172adbaeb88ab68ca44ccb1f7ec6a9352b05c7352f5c238ea`
MD5	`902eecdd6581967e0d988f52b1de7423`
BLAKE2b-256	`7a639149d2a64b5ec09c41eb9e5cf1e38e1fc366a180d9091d795bb359d59d08`

Hashes for graphique-0.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Hashes for graphique-0.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm	Hash digest
SHA256	`ffcea566b02d9d9b6f002e102cb0a752d4e394addef31c42a54837f822cb14ea`
MD5	`ed3c0490906822adb7db786bf1f8665f`
BLAKE2b-256	`d622988a4355bf7f4d0867f2ad84b3c308bc7816240f9bd90b18001534a9e1eb`

Hashes for graphique-0.6-cp38-cp38-macosx_10_14_x86_64.whl

Hashes for graphique-0.6-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`54affbf63686cbb7fdd5e6f6ff3e8769e03096df403b8442ed0237019ac263b2`
MD5	`15d4461a8333d08e2091fb489af74062`
BLAKE2b-256	`e1dc6e4e687521676371899e5c151e8fff2f26193eecd883741434760542167f`

Hashes for graphique-0.6-cp37-cp37m-win_amd64.whl

Hashes for graphique-0.6-cp37-cp37m-win_amd64.whl
Algorithm	Hash digest
SHA256	`a038842c8d059f47b6d6c08ba524735d32d187507d6831ca4802ce39e3f71b72`
MD5	`c6f5d40a5a7270fd337651b059137d5d`
BLAKE2b-256	`86b84d3abcb88004715219a0da41dce4d4d0f8e6654a764eb1027edddbeb3cf6`

Hashes for graphique-0.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl

Hashes for graphique-0.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm	Hash digest
SHA256	`92ffea4a0edf4b7130e6e8b55a6af1ff897bd0ebcf0e4307947432377abc7c79`
MD5	`e544bf599f04687e1040992af79b4d8b`
BLAKE2b-256	`d883f37dffe478383b42f09095877fb4d6b2c914a98ecb633ece5b592e681bfd`

Hashes for graphique-0.6-cp37-cp37m-macosx_10_14_x86_64.whl

Hashes for graphique-0.6-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm	Hash digest
SHA256	`8cfcc2d03f41f4b822f68e009ed3520671bcd037c982ee2509be6729b81a0179`
MD5	`b6662f791cb2fc2c69665ae57b199ea4`
BLAKE2b-256	`2bc48652eb9e3a97dd7123d0478f814dc04331f01e207c7376a551f6bd446531`