GraphQL service for arrow tables and parquet data sets.
Project description
GraphQL service for arrow tables and parquet data sets. The schema for a query API is derived automatically.
Usage
% env PARQUET_PATH=... uvicorn graphique.service:app
Open http://localhost:8000/graphql to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet
.
% python3 -m graphique.schema ...
outputs the graphql schema for a parquet data set.
Configuration
Graphique uses Starlette's config: in environment variables or a .env
file. Config variables are used as input to ParquetDataset.
- COLUMNS = None
- DEBUG = False
- DICTIONARIES = None
- INDEX = None
- MMAP = True
- PARQUET_PATH
API
types
Table
: an arrow Table; the primary interface.Column
: an arrow Column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, DateTime, Time, Duration, Binary, String, List, Struct. All columns have avalues
field for their list of scalars. Additional fields vary by type.Row
: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A singlerow
field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.
selection
slice
: contiguous selection of rowssearch
: binary search if the table is sorted, i.e., provides an indexfilter
: select rows from predicate functions
projection
columns
: provides a field for everyColumn
in the schemacolumn
: access a column of any type by namerow
: provides a field for each scalar of a single rowapply
: transform columns by applying a function
aggregation
group
: group by given columns, transforming the others into list columnsunique
: group by given columns, only retaining one scalar per grouppartition
: partition on adjacent values in given columns, transforming the others into list columnsaggregate
: apply reduce functions to list columnstables
: return a list of tables by splitting on the scalars in list columns
ordering
sort
: sort table by given columnsmin
: select rows with smallest valuesmax
: select rows with largest values
Performance
Graphique relies on native pyarrow routines wherever possible. Otherwise it falls back to using NumPy, with zero-copy views. Graphique also has custom optimizations for grouping, dictionary-encoded arrays, and chunked arrays.
Specifying an INDEX
of columns indicates the table is sorted, and enables the binary search
field.
Installation
% pip install graphique
Dependencies
- pyarrow >=4
- strawberry-graphql >=0.54
- uvicorn (or other ASGI server)
- pytz (optional timestamp support)
Tests
100% branch coverage.
% pytest [--cov]
Changes
0.4
- Pyarrow >=4 required
sort
updated to use new native routinespartition
tables by adjacent values and differencesfilter
supports unknown column types using tagged union patternGroups
replaced withTable.tables
andTable.aggregate
fields- Tagged unions used for
filter
,apply
, andpartition
functions
0.3
- Pyarrow >=3 required
any
andall
fields- String column
split
field
0.2
ListColumn
andStructColumn
typesGroups
type withaggregate
fieldgroup
andunique
optimized- pyarrow >= 2 required
- Statistical fields:
mode
,stddev
,variance
is_in
,min
, andmax
optimized
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for graphique-0.4-cp39-cp39-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f12da5a9d3c582ec1b2636d41155a75e96cbf32038e83b68ee163972ca4657c7 |
|
MD5 | 09a8cfea0a3d645f0e0c1c73db0cb5c1 |
|
BLAKE2b-256 | a872b18fd1da24bc22ea3b6228b5396c54055cc89f5901a6cd5fd3628f9ff0bb |
Hashes for graphique-0.4-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ff4f9961da0bddb54bc0fd0d8252252af2325bffe07db89036c81df511a1c771 |
|
MD5 | cea732c94cb299cf52c8a8e378325f5b |
|
BLAKE2b-256 | 3bf741857ef8e91c26640baed90c0821c3d45c45473e71a5cfaf73aa5278d3a0 |
Hashes for graphique-0.4-cp39-cp39-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ab39aee5ca9c87b29f64040542cf4d147f3141b1bd23acea595fab076294480a |
|
MD5 | 6f6dcd0e8fccf7c140fe286f8dc671dc |
|
BLAKE2b-256 | 75445d7729b187745f18713369b4282c51209ed4823ea1bb543789b2d4a0c189 |
Hashes for graphique-0.4-cp38-cp38-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 858bf31b05e0517fae7782dfcf0e7743530a659dc630fdf6532087cecbdbefbd |
|
MD5 | 61c4c6e9d2792f6c349e242b35ed4ef1 |
|
BLAKE2b-256 | f6be164e452d1b91248c71dc972d485ac2c9a2022598ded9dabdc8167cb6fd35 |
Hashes for graphique-0.4-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f8ee9f5e4a8a196e0b555477f23f64e2e947c80d3d0589f8bc958a2246266d0 |
|
MD5 | bdcc8ed20b8dbd87912e7fd95cc13704 |
|
BLAKE2b-256 | 10857b1434703e60af780cdaaccd0d8b85d94fdc48d9ef228615f1f30872c0c0 |
Hashes for graphique-0.4-cp38-cp38-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 47e103fef8623b89b3951964e9ccf494b7fd486d535d4d767de7a6dae34eb254 |
|
MD5 | aae5a38ca2d34ec1e38afdc0630fa735 |
|
BLAKE2b-256 | 46909d9dcc06ff5105c49bbca83e2e36c7c78305346ab6f1581057c915195f6b |
Hashes for graphique-0.4-cp37-cp37m-win_amd64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3af1f5df15be0e4687a13ed55e2cc4ee703c7121d8540e4fc64d56e014e07340 |
|
MD5 | b1cfa2735d7d50eeaa3ec926160e4c91 |
|
BLAKE2b-256 | 19cdb33083d717426530b83d3729c1b5844b6fddb1c7a937924edd2379206c2f |
Hashes for graphique-0.4-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 15733a7fed04f5b24797dc558a7920157a304c22ae0d16c96f1cfb56e1767f25 |
|
MD5 | b152f29894bbd1ded81a90fe4813c35b |
|
BLAKE2b-256 | e6cc664e9bee052d99ab3ad60b312b3e4c469ab786798f6af6704aea521ffd6a |
Hashes for graphique-0.4-cp37-cp37m-macosx_10_14_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | db3d1145664a4c11b111d1f99e5c056936db6610c02e6740e2589cd4a08816f6 |
|
MD5 | 0b55985323f19f357890deb9b043ac78 |
|
BLAKE2b-256 | dd91ea8b0227855444face158161a88e487d3a6ccdff9ab0254f2ddbf0fbb53d |