Skip to main content

GraphQL service for arrow tables and parquet data sets.

Project description

image image image image image image image image image

GraphQL service for arrow tables and parquet data sets. The schema is derived automatically.

Usage

% env PARQUET_PATH=... uvicorn graphique.service:app

Open http://localhost:8000/graphql to try out the API in GraphiQL. There is a test fixture at ./tests/fixtures/zipcodes.parquet.

Configuration

Graphique uses Starlette's config: in environment variables or a .env file. Config variables are used as input to ParquetDataset.

  • COLUMNS = None
  • DEBUG = False
  • DICTIONARIES = None
  • INDEX = None
  • MMAP = True
  • PARQUET_PATH

Queries

A Table is the primary interface. It has fields for filtering, sorting, and grouping.

"""a column-oriented table"""
type Table {
  """number of rows"""
  length: Long!

  """column names"""
  names: [String!]!

  """fields for each column"""
  columns: Columns!

  """
  Return column of any type by name.
          This is typically only needed for aliased columns added by `apply` or `Groups.aggregate`.
          If the column is in the schema, `columns` can be used instead.
  """
  column(name: String!): Column!

  """Return scalar values at index."""
  row(index: Long! = 0): Row!

  """Return table slice."""
  slice(offset: Long! = 0, length: Long): Table!

  """
  Return tables grouped by columns, with stable ordering.
          `length` is the maximum number of tables to return.
          `count` filters and sorts tables based on the number of rows within each table.
  """
  group(by: [String!]!, reverse: Boolean! = false, length: Long, count: CountQuery): Groups!

  """
  Return table of first or last occurrences grouped by columns, with stable ordering.
          Optionally include counts in an aliased column.
          Faster than `group` when only scalars are needed.
  """
  unique(by: [String!]!, reverse: Boolean! = false, count: String! = ""): Table!

  """Return table slice sorted by specified columns."""
  sort(by: [String!]!, reverse: Boolean! = false, length: Long): Table!

  """Return table with minimum values per column."""
  min(by: [String!]!): Table!

  """Return table with maximum values per column."""
  max(by: [String!]!): Table!

  """
  Return table with rows which match all (by default) queries.
          `invert` optionally excludes matching rows.
          `reduce` is the binary operator to combine filters; within a column all predicates must match.
  """
  filter(query: Filters!, invert: Boolean! = false, reduce: Operator! = AND): Table!

  """
  Return view of table with functions applied across columns.
          If no alias is provided, the column is replaced and should be of the same type.
          If an alias is provided, a column is added and may be referenced in the `column` interface,
          and in the `by` arguments of grouping and sorting.
  """
  apply(...): Table!
}

Performance

Graphique relies on native pyarrow routines wherever possible. Otherwise it falls back to using NumPy, with zero-copy views. Graphique also has custom optimizations for grouping, dictionary-encoded arrays, and chunked arrays.

Specifying an INDEX of columns indicates the table is sorted, and enables a binary search interface.

  """
  Return table with matching values for compound `index`.
          Queries must be a prefix of the `index`.
          Only one non-equal query is allowed, and applied last.
  """
  search(...): Table!

Installation

% pip install graphique

Dependencies

  • pyarrow >=3
  • strawberry-graphql >=0.42
  • uvicorn (or other ASGI server)
  • pytz (optional timestamp support)

Tests

100% branch coverage.

% pytest [--cov]

Changes

0.3

  • Pyarrow >=3 required
  • any and all fields
  • Sting column split field

0.2

  • ListColumn and StructColumn types
  • Groups type with aggregate field
  • group and unique optimized
  • pyarrow >= 2 required
  • Statistical fields: mode, stddev, variance
  • is_in, min, and max optimized

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

graphique-0.3.zip (145.3 kB view hashes)

Uploaded Source

Built Distributions

graphique-0.3-cp39-cp39-manylinux2014_x86_64.whl (478.4 kB view hashes)

Uploaded CPython 3.9

graphique-0.3-cp39-cp39-macosx_10_14_x86_64.whl (96.5 kB view hashes)

Uploaded CPython 3.9 macOS 10.14+ x86-64

graphique-0.3-cp38-cp38-manylinux2014_x86_64.whl (489.8 kB view hashes)

Uploaded CPython 3.8

graphique-0.3-cp38-cp38-macosx_10_14_x86_64.whl (94.5 kB view hashes)

Uploaded CPython 3.8 macOS 10.14+ x86-64

graphique-0.3-cp37-cp37m-manylinux2014_x86_64.whl (446.7 kB view hashes)

Uploaded CPython 3.7m

graphique-0.3-cp37-cp37m-macosx_10_14_x86_64.whl (95.1 kB view hashes)

Uploaded CPython 3.7m macOS 10.14+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page