Skip to main content

GraphQL service for arrow tables and parquet data sets.

Project description

image image image image build image CodeQL CodSpeed Badge image image

GraphQL service for arrow tables and parquet data sets. The schema for a query API is derived automatically.

Usage

% env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/ to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

% env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema

outputs the graphql schema for a parquet data set.

Configuration

Graphique uses Starlette's config: in environment variables or a .env file. Config variables are used as input to a parquet dataset.

  • PARQUET_PATH: path to the parquet directory or file
  • FEDERATED = '': field name to extend type Query with a federated Table
  • DEBUG = False: run service in debug mode, which includes metrics
  • COLUMNS = None: list of names, or mapping of aliases, of columns to select
  • FILTERS = None: json filter query for which rows to read at startup

For more options create a custom ASGI app. Call graphique's GraphQL on an arrow Dataset, Scanner, or Table. The GraphQL Table type will be the root Query type.

Supply a mapping of names to datasets for multiple roots, and to enable federation.

import pyarrow.dataset as ds
from graphique import GraphQL

source = ds.dataset(...)
app = GraphQL(source)  # Table is root query type
app = GraphQL.federated({<name>: source, ...}, keys={<name>: [], ...})  # Tables on federated fields

Start like any ASGI app.

uvicorn <module>:app

Configuration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.

API

types

  • Dataset: interface for an arrow dataset, scanner, or table.
  • Table: implements the Dataset interface. Adds typed row, columns, and filter fields from introspecting the schema.
  • Column: interface for an arrow column (a.k.a. ChunkedArray). Each arrow data type has a corresponding column implementation: Boolean, Int, Long, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, List, Struct. All columns have a values field for their list of scalars. Additional fields vary by type.
  • Row: scalar fields. Arrow tables are column-oriented, and graphique encourages that usage for performance. A single row field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

selection

  • slice: contiguous selection of rows
  • filter: select rows with simple predicates
  • scan: select rows and project columns with expressions

projection

  • columns: provides a field for every Column in the schema
  • column: access a column of any type by name
  • row: provides a field for each scalar of a single row
  • apply: transform columns by applying a function
  • join: join tables by key columns

aggregation

  • group: group by given columns, and aggregate the others
  • runs: partition on adjacent values in given columns, transforming the others into list columns
  • tables: return a list of tables by splitting on the scalars in list columns
  • flatten: flatten list columns with repeated scalars

ordering

  • sort: sort table by given columns
  • rank: select rows with smallest or largest values

Performance

Graphique relies on native PyArrow routines wherever possible. Otherwise it falls back to using NumPy or custom optimizations.

By default, datasets are read on-demand, with only the necessary rows and columns scanned. Although graphique is a running service, parquet is performant at reading a subset of data. Optionally specify FILTERS in the json filter format to read a subset of rows at startup, trading-off memory for latency. An empty filter ({}) will read the whole table.

Specifying COLUMNS will limit memory usage when reading at startup (FILTERS). There is little speed difference as unused columns are inherently ignored. Optional aliasing can also be used for camel casing.

If index columns are detected in the schema metadata, then an initial filter will also attempt a binary search on tables.

Installation

% pip install graphique[server]

Dependencies

  • pyarrow
  • strawberry-graphql[asgi,cli]
  • numpy
  • isodate
  • uvicorn (or other ASGI server)

Tests

100% branch coverage.

% pytest [--cov]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphique-1.8.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

graphique-1.8-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file graphique-1.8.tar.gz.

File metadata

  • Download URL: graphique-1.8.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for graphique-1.8.tar.gz
Algorithm Hash digest
SHA256 927f45183e8c673c259383dcfd7de495b1f226331a1a776d4e32b01d2de56faf
MD5 3c42c126f095553b6da274c69f3ac445
BLAKE2b-256 910d3de4e1e8a4bdf32f2e97903213e6620ede65c0ed5b1c76952c14550b47f4

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphique-1.8.tar.gz:

Publisher: release.yml on coady/graphique

Attestations:

File details

Details for the file graphique-1.8-py3-none-any.whl.

File metadata

  • Download URL: graphique-1.8-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for graphique-1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 50b9d0ae0976c8890bfc5a81b1c3d9b1d22d0d3b846372adec525fcd318886f0
MD5 2ea108ad1ae967867f7b728dab56d232
BLAKE2b-256 8061ec7f151cf2e3b948e90b26a16e38148f4de0016928243f8a91d450cba8a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphique-1.8-py3-none-any.whl:

Publisher: release.yml on coady/graphique

Attestations:

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page