Query Layer for Google Cloud Bigtable
Project description
BigtableQL
BigtableQL provides a SQL Query Layer for Google Cloud Bigtable.
Use Cases
Cloud Bigtable is Google's fully managed NoSQL Big Data database service. Each table contains rows and columns. Each row/column intersection can contain multiple cells. Each cell contains a unique timestamped version of the data for that row and column. Thus Bigtable is often used to store time series data.
BigtableQL provides a SQL query layer to run aggregation query on Bigtable.
Quick Start
Using the weather balloon example data shown in Single-timestamp unserialized schema design
Row key pressure temperature humidity altitude
us-west2#3698#2021-03-05-1200 94558 9.6 61 612
us-west2#3698#2021-03-05-1201 94122 9.7 62 611
us-west2#3698#2021-03-05-1202 95992 9.5 58 602
us-west2#3698#2021-03-05-1203 96025 9.5 66 598
us-west2#3698#2021-03-05-1204 96021 9.6 63 624
we are able to calculate average pressure of the period by
import bigtableql
# config follows offical python bigtable client
client = bigtableql.Client(config)
client.register_table(
"weather_balloons",
instance_id="INSTANCE_ID",
column_families={
"measurements": {
"only_read_latest": True,
"columns": {
"pressure": int,
"temperature": str,
"humidity": int,
"altitude": int
}
}
}
)
client.query("measurements", """
SELECT avg(pressure) FROM weather_balloons
WHERE
"_row_key" BETWEEN 'us-west2#3698#2021-03-05-1200' AND 'us-west2#3698#2021-03-05-1204'
""")
Or with row key decomposition
client.register_table(
"weather_balloons",
instance_id="INSTANCE_ID",
column_families={
"measurements": {
"only_read_latest": True,
"columns": {
"pressure": int,
"temperature": str,
"humidity": int,
"altitude": int
}
}
},
row_key_identifiers=["location", "balloon_id", "event_minute"],
row_key_separator="#"
)
client.query("measurements", """
SELECT balloon_id, avg(pressure) FROM weather_balloons
WHERE
location = 'us-west2'
AND balloon_id IN ('3698', '3700')
AND event_minute BETWEEN '2021-03-05-1200' AND '2021-03-05-1204'
GROUP BY 1
""")
The output of query
is list of pyarrow.RecordBatch. It can be easily convert to python dictionary (to_pydict
) and pandas dataframe (to_pandas
).
Alternative
However, as of 2022-01, it
- only supports "us-central1" and "europe-west1" region
- only supports query with "rowkey"
- by default can run up to 4 concurrent queries against Bigtable external data source
Roadmap
SQL
- ✅ Filter (WHERE): "=", "IN", "BETWEEN", ">", ">=", "<", "<="
- ✅ GROUP BY
- ✅ ORDER BY
- ✅ HAVING
- ✅ LIMIT
- ✅ Aggregate (e.g. avg, sum, count)
- ✅ AND
- Alias
- Cast
- Common Math Functions
- Common Date/Time Functions
- OR ???
- Join ???
- Insert ???
General
- ✅ Partition Pruning
- ✅ Projection pushdown
- Predicate push down (only Value range is possible)
- Limit Pushdown
Limitation
- for row key encoding, only string is supported
- for single/composite row key, identifiers supports "=" and "IN". Additionally, last identifier also supports "BETWEEN".
- for qualifiers, only string and integer (64bit BigEndian encoding) value are supported
- subqueries and common table expressions are not supported
Technical Details
BigtableQL depends on
- sqloxide and sqlparser-rs: SQL parser
- python-bigtable: offical python bigtable client
- datafusion-python: in memory query engine
- pyarrow: in memory columnar store / dataframe
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file BigtableQL-0.1.1.tar.gz
.
File metadata
- Download URL: BigtableQL-0.1.1.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.9 Darwin/21.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74db32961a24dcab6cba7eef74da64318e769a4c5539d5cc0aa05e11411ec972 |
|
MD5 | 6008db4a211d9225bb98f49764c65213 |
|
BLAKE2b-256 | 44398f832f0eac96a63b0a83c65c579b75a66b095bc40e875ca9683108268add |
File details
Details for the file BigtableQL-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: BigtableQL-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.8.9 Darwin/21.2.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ab839cdfc09dfc9d519e11f063eb9f703ef6032dd3aee59af0efc7de6660189 |
|
MD5 | 596ad56ab2c9df4c92917f8b04fddb8e |
|
BLAKE2b-256 | 06c70715c68d8f25a59ff5178ffb067c54c8734508e9022e7cd43b1551e8f93a |