Skip to main content

Query Layer for Google Cloud Bigtable

Project description

BigQL

BigQL provides a SQL Query Layer for Google Cloud Bigtable.

Use Cases

Cloud Bigtable is Google's fully managed NoSQL Big Data database service. Each table contains rows and columns. Each row/column intersection can contain multiple cells. Each cell contains a unique timestamped version of the data for that row and column. Thus Bigtable is often used to store time series data.

BigQL provides a SQL query layer to run aggregation query on Bigtable.

Quick Start

pip install bigql

Using the weather balloon example data shown in Single-timestamp unserialized schema design

Row key                         pressure    temperature humidity    altitude
us-west2#3698#2021-03-05-1200   94558       9.6         61          612
us-west2#3698#2021-03-05-1201   94122       9.7         62          611
us-west2#3698#2021-03-05-1202   95992       9.5         58          602
us-west2#3698#2021-03-05-1203   96025       9.5         66          598
us-west2#3698#2021-03-05-1204   96021       9.6         63          624

After initialize the client

from bigql.client import Client
# config follows offical python bigtable client
client = Client(config)

client.register_table(
    "weather_balloons",
    instance_id="INSTANCE_ID",
    column_families={
        "measurements": {
            "only_read_latest": True,
            "columns": {
                "pressure": int,
                "temperature": str,
                "humidity": int,
                "altitude": int
            }
        }
    }
)

we are able to calculate average pressure of the period by

client.query("measurements", """
SELECT avg(pressure) FROM weather_balloons
WHERE
  "_row_key" BETWEEN 'us-west2#3698#2021-03-05-1200' AND 'us-west2#3698#2021-03-05-1204'
""")

Or with row key decomposition

client.register_table(
    xxx,
    row_key_identifiers=["location", "balloon_id", "event_minute"],
    row_key_separator="#"
)

client.query("measurements", """
SELECT balloon_id, avg(pressure) FROM weather_balloons
WHERE
  location = 'us-west2'
  AND balloon_id IN ('3698', '3700')
  AND event_minute BETWEEN '2021-03-05-1200' AND '2021-03-05-1204'
GROUP BY 1
""")

The output of query is list of pyarrow.RecordBatch. It can be easily convert to python dictionary (to_pydict) and pandas dataframe (to_pandas).

Group by Time

Each cell in Bigtable have a timestamp. SELECT "_timestamp" will return a float number, represent seconds since Unix epoch. Following is an example to select 5 minutes interval

SELECT to_timestamp_seconds(cast(floor("_timestamp" / 600) * 600 as bigint)) as interval

Alternative

  1. Google BigQuery external data source

However, as of 2022-01, it

  • only supports "us-central1" and "europe-west1" region
  • only supports query with "rowkey"
  • by default can run up to 4 concurrent queries against Bigtable external data source

Roadmap

SQL

  • ✅ Insert Into
  • ✅ Select *
  • ✅ Select column(s)
  • ✅ Filter (WHERE): "=", "IN", "BETWEEN", ">", ">=", "<", "<="
  • ✅ GROUP BY
  • ✅ ORDER BY
  • ✅ HAVING
  • ✅ LIMIT
  • ✅ Aggregate (e.g. avg, sum, count)
  • ✅ AND
  • ✅ Alias
  • ✅ Cast
  • ✅ Common Math Functions
  • Common Date/Time Functions
  • OR ???
  • Join ???

General

  • ✅ Partition Pruning
  • ✅ Projection pushdown
  • ❌ Predicate push down (Value range and Value regex)
    • not work well, because its filter works on all cells, not only predicate column
  • Limit Pushdown ???

Limitation

  • for row key encoding, only string is supported
  • for single/composite row key, identifiers supports "=" and "IN". Additionally, last identifier also supports "BETWEEN".
  • for qualifiers, only string and integer (64bit BigEndian encoding) value are supported
  • subqueries and common table expressions are not supported

Technical Details

BigQL depends on

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

BigQL-0.1.11.tar.gz (12.0 kB view details)

Uploaded Source

Built Distribution

BigQL-0.1.11-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file BigQL-0.1.11.tar.gz.

File metadata

  • Download URL: BigQL-0.1.11.tar.gz
  • Upload date:
  • Size: 12.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for BigQL-0.1.11.tar.gz
Algorithm Hash digest
SHA256 08053eb82fbb276bf04fec4c846f530ea3613da036c9fd9b67cb74cf95abf9ea
MD5 79050d064c25e42e18e60b1cefd2b759
BLAKE2b-256 fbea8291c67001c68f9211548a7f96868541365c98bacb59fb59d8e9ff4f9ad8

See more details on using hashes here.

File details

Details for the file BigQL-0.1.11-py3-none-any.whl.

File metadata

  • Download URL: BigQL-0.1.11-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for BigQL-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 d756d81b09aeae9fa52181f9f75cf79fd4b8f899880e0b918b516e47d21e4300
MD5 2b56970518b2f47814cec669a3b84d75
BLAKE2b-256 25df0bba1782841f1b92c4fb947985b03e4bd92c9693e53174946dc73a104457

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page