GraphQL service for arrow tables and parquet data sets.
Project description
GraphQL service for arrow tables and parquet data sets. The schema for a query API is derived automatically.
Usage
% env PARQUET_PATH=... uvicorn graphique.service:app
Open http://localhost:8000/graphql to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet
.
% python3 -m graphique.schema ...
outputs the graphql schema for a parquet data set.
Configuration
Graphique uses Starlette's config: in environment variables or a .env
file. Config variables are used as input to ParquetDataset.
- COLUMNS = []: names of columns to read at startup;
*
indicates all - DEBUG = False: run service in debug mode, which includes timing
- DICTIONARIES = None: names of columns to read as dictionaries
- FILTERS = None: predicates for which rows to read
- INDEX = []: names of columns which are represent a sorted composite index or partition keys
- MMAP = False: use a memory map to read the files
- PARQUET_PATH: path to the parquet directory or file
API
types
Table
: an arrow Table; the primary interface.Column
: an arrow Column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, DateTime, Time, Duration, Binary, String, List, Struct. All columns have avalues
field for their list of scalars. Additional fields vary by type.Row
: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A singlerow
field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.
selection
slice
: contiguous selection of rowssearch
: binary search if the table is sorted, i.e., provides an indexfilter
: select rows from predicate functions
projection
columns
: provides a field for everyColumn
in the schemacolumn
: access a column of any type by namerow
: provides a field for each scalar of a single rowapply
: transform columns by applying a function
aggregation
group
: group by given columns, transforming the others into list columnspartition
: partition on adjacent values in given columns, transforming the others into list columnsaggregate
: apply reduce functions to list columnstables
: return a list of tables by splitting on the scalars in list columns
ordering
sort
: sort table by given columnsmin
: select rows with smallest valuesmax
: select rows with largest values
Performance
Graphique relies on native pyarrow routines wherever possible. Otherwise it falls back to using NumPy, with zero-copy views. Graphique also has custom optimizations for grouping, dictionary-encoded arrays, and chunked arrays.
By default, datasets are read on-demand, with only the necessary columns selected. Additionally filter(query: ...)
is optimized to filter rows while reading the dataset. Although graphique is a running service, parquet is performant at reading a subset of data. Optionally specify COLUMNS
to read a subset of columns (or *
) at startup, trading-off memory for latency.
Specifying an INDEX
with COLUMNS
indicates the table is sorted, and enables the binary search
field. Specifying just INDEX
is allowed but only recommended if it corresponds to the partition keys; search(...)
is functionally equivalent to filter(query: ...)
without COLUMNS
.
Installation
% pip install graphique[server]
Dependencies
- pyarrow >=6
- strawberry-graphql[asgi] >=0.84.4
- uvicorn (or other ASGI server)
- pytz (optional timestamp support)
Tests
100% branch coverage.
% pytest [--cov]
Changes
0.6
- Pyarrow >=6 required
- Group by optimized and replaced
unique
field - Dictionary related optimizations
- Null consistency with arrow
count
functions
0.5
- Pyarrow >=5 required
- Stricter validation of inputs
- Columns can be cast to another arrow data type
- Grouping uses large list arrays with 64-bit counts
- Datasets are read on-demand or optionally at startup
0.4
- Pyarrow >=4 required
sort
updated to use new native routinespartition
tables by adjacent values and differencesfilter
supports unknown column types using tagged union patternGroups
replaced withTable.tables
andTable.aggregate
fields- Tagged unions used for
filter
,apply
, andpartition
functions
0.3
- Pyarrow >=3 required
any
andall
fields- String column
split
field
0.2
ListColumn
andStructColumn
typesGroups
type withaggregate
fieldgroup
andunique
optimized- pyarrow >= 2 required
- Statistical fields:
mode
,stddev
,variance
is_in
,min
, andmax
optimized
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for graphique-0.6-cp310-cp310-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d75d06b5505f61b843f42645725ab8954a9f0a3aeb5f7c6c8cf457a7685f6a8a |
|
MD5 | ce18cc6e7c50c700a32eaf8316a3ef60 |
|
BLAKE2b-256 | 3b230ed5091d5ddbb1aee8d96ebb8c632594ef699f2b89c6ca0e899faad9b2a7 |
Hashes for graphique-0.6-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac99206fb3d0d21900bbb83977224e98ab78cab399707fda221e1c7742630b3b |
|
MD5 | 076d1e0feee9e01618008f969b1dd08b |
|
BLAKE2b-256 | 3fc084951451f8a005f36b02876e51e65dc8d473bb9e11ae28a69274d6e36ff0 |
Hashes for graphique-0.6-cp310-cp310-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 83dcb12169dfbad1f8f2cabb5ceeacf2f480a1236836bb5b9e48cab629059132 |
|
MD5 | 17eaceb09d2bed89be86dd8fbbe79480 |
|
BLAKE2b-256 | 927a83d345af02641f99827a77668d9b017dfb58793149f99d2d5e30d45f39fa |
Hashes for graphique-0.6-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de6adc6fd2e9b3f630cb5320494297cf93ec6af6c55c9d919eaec64edd32adef |
|
MD5 | 2790eeae1a7cf3b994f24c817a1b96cd |
|
BLAKE2b-256 | c4e2ca18156e756d488859af782cabdf92cab153f29a81acfeec5e87186c68af |
Hashes for graphique-0.6-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c85cb97c68849d50f70ee911d34d58c5db8a20acebeae68f8ff7d304b7b65fd4 |
|
MD5 | 5a39c685be8cf78f51415ce7e8efb203 |
|
BLAKE2b-256 | bbd1c424c2d268baa4810cb57541d2243be26f7a345214de6b137554c49ef58f |
Hashes for graphique-0.6-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c11a4dd389c65ff24e4c38ff221fd134a0e177a3380424dbe49dfac51958e205 |
|
MD5 | e7a233634652f634048ffd1860633bdf |
|
BLAKE2b-256 | f176b89cb3ee834dd017d602e547fbaff27c8f13406d9ea4edbd3674e01f50a3 |
Hashes for graphique-0.6-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 903c9c8bf179bbc172adbaeb88ab68ca44ccb1f7ec6a9352b05c7352f5c238ea |
|
MD5 | 902eecdd6581967e0d988f52b1de7423 |
|
BLAKE2b-256 | 7a639149d2a64b5ec09c41eb9e5cf1e38e1fc366a180d9091d795bb359d59d08 |
Hashes for graphique-0.6-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ffcea566b02d9d9b6f002e102cb0a752d4e394addef31c42a54837f822cb14ea |
|
MD5 | ed3c0490906822adb7db786bf1f8665f |
|
BLAKE2b-256 | d622988a4355bf7f4d0867f2ad84b3c308bc7816240f9bd90b18001534a9e1eb |
Hashes for graphique-0.6-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54affbf63686cbb7fdd5e6f6ff3e8769e03096df403b8442ed0237019ac263b2 |
|
MD5 | 15d4461a8333d08e2091fb489af74062 |
|
BLAKE2b-256 | e1dc6e4e687521676371899e5c151e8fff2f26193eecd883741434760542167f |
Hashes for graphique-0.6-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a038842c8d059f47b6d6c08ba524735d32d187507d6831ca4802ce39e3f71b72 |
|
MD5 | c6f5d40a5a7270fd337651b059137d5d |
|
BLAKE2b-256 | 86b84d3abcb88004715219a0da41dce4d4d0f8e6654a764eb1027edddbeb3cf6 |
Hashes for graphique-0.6-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92ffea4a0edf4b7130e6e8b55a6af1ff897bd0ebcf0e4307947432377abc7c79 |
|
MD5 | e544bf599f04687e1040992af79b4d8b |
|
BLAKE2b-256 | d883f37dffe478383b42f09095877fb4d6b2c914a98ecb633ece5b592e681bfd |
Hashes for graphique-0.6-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8cfcc2d03f41f4b822f68e009ed3520671bcd037c982ee2509be6729b81a0179 |
|
MD5 | b6662f791cb2fc2c69665ae57b199ea4 |
|
BLAKE2b-256 | 2bc48652eb9e3a97dd7123d0478f814dc04331f01e207c7376a551f6bd446531 |