Skip to main content

GraphQL service for python dataframes and parquet datasets.

Project description

image image image image build image CodeQL image ty

GraphQL service for ibis dataframes, arrow tables, and parquet datasets. The schema for a query API is derived automatically.

Version 2

When this project started, there was no out-of-core execution engine with performance comparable to PyArrow. So it effectively included one, based on datasets and Acero.

Since then the ecosystem has grown considerably: DuckDB, DataFusion, and Ibis. As of version 2, graphique is based on ibis. It provides a common dataframe API for multiple backends, enabling graphique to also have a default but configurable backend.

Being a major version upgrade, there are incompatible changes from version 1. However the overall API remains largely the same.

Usage

There is an example app which reads a parquet dataset.

env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/ to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

env PARQUET_PATH=... strawberry export-schema graphique.service:app.schema

outputs the graphql schema.

Configuration

The example app uses Starlette's config: in environment variables or a .env file.

  • PARQUET_PATH: path to the parquet directory or file
  • FEDERATED = '': field name to extend type Query with a federated Table
  • METRICS = False: include timings from apollo tracing extension
  • COLUMNS = None: list of names, or mapping of aliases, of columns to select
  • FILTERS = None: json filter query for which rows to read at startup

Configuration options exist to provide a convenient no-code solution, but are subject to change in the future. Using a custom app is recommended for production usage.

App

For more options create a custom ASGI app. Call graphique's GraphQL on an ibis Table or arrow Dataset. Supply a mapping of names to datasets for multiple roots, and to enable federation.

import ibis
from graphique import GraphQL

source = ibis.read_*(...)  # or ibis.connect(...).table(...) or pyarrow.dataset.dataset(...)
# apply initial projections or filters to `source`
app = GraphQL(source)  # Table is root query type
app = GraphQL.federated({<name>: source, ...}, keys={<name>: [], ...})  # Tables on federated fields

Start like any ASGI app.

uvicorn <module>:app

API

types

  • Dataset: interface for an ibis table or arrow dataset.
  • Table: implements the Dataset interface. Adds typed row, columns, and filter fields from introspecting the schema.
  • Column: interface for an ibis column. Each data type has a corresponding column implementation: Boolean, Int, BigInt, Float, Decimal, Date, Datetime, Time, Duration, Base64, String, Array, Struct. All columns have a values field for their list of scalars. Additional fields vary by type.
  • Row: scalar fields. Tables are column-oriented, and graphique encourages that usage for performance. A single row field is provided for convenience, but a field for a list of rows is not. Requesting parallel columns is far more efficient.

selection

  • slice: contiguous selection of rows
  • filter: select rows by predicates
  • join: join tables by key columns
  • take: rows by index
  • dropNull: remove rows with nulls

projection

  • project: project columns with expressions
  • columns: provides a field for every Column in the schema
  • column: access a column of any type by name
  • row: provides a field for each scalar of a single row
  • cast: cast column types
  • fillNull: fill null values

aggregation

  • group: group by given columns, and aggregate the others
  • distinct: group with all columns
  • runs: provisionally group by adjacency
  • unnest: unnest an array column
  • count: number of rows

ordering

  • order: sort table by given columns
  • options limit and dense: select rows with smallest or largest values

Performance

Performance is dependent on the ibis backend, which defaults to duckdb. There are no internal Python loops. Scalars do not become Python types until serialized.

PyArrow is also used for partitioned dataset optimizations, and for any feature which ibis does not support. Table fields are lazily evaluated up until scalars are reached, and automatically cached as needed for multiple fields.

Installation

pip install graphique[server]

Dependencies

  • ibis-framework (with duckdb or other backend)
  • strawberry-graphql[asgi,cli]
  • pyarrow
  • isodate
  • uvicorn (or other ASGI server)

Tests

100% branch coverage.

pytest [--cov]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphique-2.0.2.tar.gz (29.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

graphique-2.0.2-py3-none-any.whl (23.2 kB view details)

Uploaded Python 3

File details

Details for the file graphique-2.0.2.tar.gz.

File metadata

  • Download URL: graphique-2.0.2.tar.gz
  • Upload date:
  • Size: 29.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for graphique-2.0.2.tar.gz
Algorithm Hash digest
SHA256 ad73af2d64c470618a4d153eebff26c066333c5e83eda42a8b2f165b5b91c0ea
MD5 f5e8e75829fedca96c8079911fadcdd0
BLAKE2b-256 beb9490a5c46c1f6bc71f6360188ac46c628319f5a1fec79110563161c1d0660

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphique-2.0.2.tar.gz:

Publisher: release.yml on coady/graphique

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file graphique-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: graphique-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 23.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for graphique-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0a921f3f3c88a49a30bf5aabe8e5d62691ce42af0691c4f04fdd9a3c47e7ffd3
MD5 0d15379388fa59527f00344b3dce0284
BLAKE2b-256 40866c75497a80ab51aeecfd1f96b1d4ffce0ea695b841c9782d4675c2177b47

See more details on using hashes here.

Provenance

The following attestation bundles were made for graphique-2.0.2-py3-none-any.whl:

Publisher: release.yml on coady/graphique

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page