Skip to main content

Query Layer for Google Cloud Bigtable

Project description

BigtableQL

BigtableQL provides a SQL Query Layer for Google Cloud Bigtable.

Use Cases

Cloud Bigtable is Google's fully managed NoSQL Big Data database service. Each table contains rows and columns. Each row/column intersection can contain multiple cells. Each cell contains a unique timestamped version of the data for that row and column. Thus Bigtable is often used to store time series data.

BigtableQL provides a SQL query layer to run aggregation query on Bigtable.

Quick Start

Using the weather balloon example data shown in Single-timestamp unserialized schema design

Row key                         pressure    temperature humidity    altitude
us-west2#3698#2021-03-05-1200   94558       9.6         61          612
us-west2#3698#2021-03-05-1201   94122       9.7         62          611
us-west2#3698#2021-03-05-1202   95992       9.5         58          602
us-west2#3698#2021-03-05-1203   96025       9.5         66          598
us-west2#3698#2021-03-05-1204   96021       9.6         63          624

we are able to calculate average pressure of the period by

from bigtableql.client import Client
# config follows offical python bigtable client
client = Client(config)

client.register_table(
    "weather_balloons",
    instance_id="INSTANCE_ID",
    column_families={
        "measurements": {
            "only_read_latest": True,
            "columns": {
                "pressure": int,
                "temperature": str,
                "humidity": int,
                "altitude": int
            }
        }
    }
)

client.query("measurements", """
SELECT avg(pressure) FROM weather_balloons
WHERE
  "_row_key" BETWEEN 'us-west2#3698#2021-03-05-1200' AND 'us-west2#3698#2021-03-05-1204'
""")

Or with row key decomposition

client.register_table(
    "weather_balloons",
    instance_id="INSTANCE_ID",
    column_families={
        "measurements": {
            "only_read_latest": True,
            "columns": {
                "pressure": int,
                "temperature": str,
                "humidity": int,
                "altitude": int
            }
        }
    },
    row_key_identifiers=["location", "balloon_id", "event_minute"],
    row_key_separator="#"
)

client.query("measurements", """
SELECT balloon_id, avg(pressure) FROM weather_balloons
WHERE
  location = 'us-west2'
  AND balloon_id IN ('3698', '3700')
  AND event_minute BETWEEN '2021-03-05-1200' AND '2021-03-05-1204'
GROUP BY 1
""")

The output of query is list of pyarrow.RecordBatch. It can be easily convert to python dictionary (to_pydict) and pandas dataframe (to_pandas).

Alternative

  1. Google BigQuery external data source

However, as of 2022-01, it

  • only supports "us-central1" and "europe-west1" region
  • only supports query with "rowkey"
  • by default can run up to 4 concurrent queries against Bigtable external data source

Roadmap

SQL

  • ✅ Insert Into
  • ✅ Select *
  • ✅ Select column(s)
  • ✅ Filter (WHERE): "=", "IN", "BETWEEN", ">", ">=", "<", "<="
  • ✅ GROUP BY
  • ✅ ORDER BY
  • ✅ HAVING
  • ✅ LIMIT
  • ✅ Aggregate (e.g. avg, sum, count)
  • ✅ AND
  • ✅ Alias
  • ✅ Cast
  • ✅ Common Math Functions
  • Common Date/Time Functions
  • OR ???
  • Join ???

General

  • ✅ Partition Pruning
  • ✅ Projection pushdown
  • ✅ Predicate push down (Value range)
  • Limit Pushdown ???

Limitation

  • for row key encoding, only string is supported
  • for single/composite row key, identifiers supports "=" and "IN". Additionally, last identifier also supports "BETWEEN".
  • for qualifiers, only string and integer (64bit BigEndian encoding) value are supported
  • subqueries and common table expressions are not supported

Technical Details

BigtableQL depends on

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BigtableQL-0.1.6.tar.gz (11.3 kB view details)

Uploaded Source

Built Distribution

BigtableQL-0.1.6-py3-none-any.whl (13.6 kB view details)

Uploaded Python 3

File details

Details for the file BigtableQL-0.1.6.tar.gz.

File metadata

  • Download URL: BigtableQL-0.1.6.tar.gz
  • Upload date:
  • Size: 11.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for BigtableQL-0.1.6.tar.gz
Algorithm Hash digest
SHA256 cf543c8fdfe004ccaa7770b9aefb775adea5bf0b20146635cefc528c5ad34c87
MD5 e6e8e9ed3950e790fa8f7cc8ce45aa6a
BLAKE2b-256 e4f6ad71e0ea18b980c801b7a72a4080ead9bc8ed1adf5fb692530eba5b0d16a

See more details on using hashes here.

File details

Details for the file BigtableQL-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: BigtableQL-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 13.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for BigtableQL-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 4782a10c23f9af27bc5b1c09bfd4944f21cdf0b4a68196b0eaeaddbea85b6e8a
MD5 6b0a19bb8b620a5a0cba0bf39b181999
BLAKE2b-256 cd97f93de88bb2de8ba58ec3e76fe3a9b0eb3feb7ea88a4c9694b7a6ec25c7a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page