Skip to main content

BigQuery ML Utils

Project description

BigQuery ML Utils

BigQuery ML (aka. BQML) lets you create and execute machine learning models in BigQuery using standard SQL queries. The BigQuery ML Utils library is an integrated suite of machine learning tools for building and using BigQuery ML models.

Installation

Install this library in a virtualenv using pip. virtualenv is a tool to create isolated Python environments. The basic problem it addresses is one of dependencies and versions, and indirectly permissions.

With virtualenv, it's possible to install this library without needing system install permissions, and without clashing with the installed system dependencies.

Mac/Linux

    pip install virtualenv
    virtualenv <your-env>
    source <your-env>/bin/activate
    <your-env>/bin/pip install bigquery-ml-utils

Windows

    pip install virtualenv
    virtualenv <your-env>
    <your-env>\Scripts\activate
    <your-env>\Scripts\pip.exe install bigquery-ml-utils

Overview

Inference

Transform Predictor

The Transform Predictor feeds input data into the BQML model trained with TRANSFORM. It performs both preprocessing and postprocessing on the input and output. The first argument is a SavedModel which represents the TRANSFORM clause for feature preprocessing. The second argument is a SavedModel or XGBoost Booster which represents the model logic.

XGBoost Predictor

The XGBoost Predictor feeds input data into the BQML XGBoost model. It performs both preprocessing and postprocessing on the input and output. The first argument is a XGBoost Booster which represents the model logic. The following arguments are model assets.

Tensorflow Ops

BQML Tensorflow Custom Ops provides SQL functions (Date functions, Datetime functions, Time functions and Timestamp functions) that are not available in TensorFlow. The implementation and function behavior align with the BigQuery. This is part of an effort to bridge the gap between the SQL community and the Tensorflow community. The following example returns the same result as TIMESTAMP_ADD(timestamp_expression, INTERVAL int64_expression date_part)

>>> timestamp = tf.constant(['2008-12-25 15:30:00+00', '2023-11-11 14:30:00+00'], dtype=tf.string)
>>> interval = tf.constant([200, 300], dtype=tf.int64)
>>> result = timestamp_ops.timestamp_add(timestamp, interval, 'MINUTE')
tf.Tensor([b'2008-12-25 18:50:00.0 +0000' b'2023-11-11 19:30:00.0 +0000'], shape=(2,), dtype=string)

Note: /usr/share/zoneinfo is needed for parsing time zone which might not be available in your OS. You will need to install tzdata to generate it. For example, add the following code in your Dockerfile.

RUN apt-get update && DEBIAN_FRONTEND="noninteractive" \
    TZ="America/Los_Angeles" apt-get install -y tzdata

Model Generator

Text Embedding Model Generator

The Text Embedding Model Generator automatically loads a text embedding model from Tensorflow hub and integrates a signature such that the resulting model can be immediately integrated within BQML. Currently, the NNLM and BERT embedding models can be selected.

NNLM Text Embedding Model

The NNLM model has a model size of <150MB and is recommended for phrases, news, tweets, reviews, etc. NNLM does not carry any default signatures because it is designed to be utilized as a Keras layer; however, the Text Embedding Model Generator takes care of this.

SWIVEL Text Embedding Model

The SWIVEL model has a model size of <150MB and is recommended for phrases, news, tweets, reviews, etc. SWIVEL does not require pre-processing because the embedding model already satisfies BQML imported model requirements. However, in order to align signatures for NNLM, SWIVEL, and BERT, the Text Embedding Model Generator establishes the same input label for SWIVEL.

BERT Text Embedding Model

The BERT model has a model size of ~200MB and is recommended for phrases, news, tweets, reviews, paragraphs, etc. The BERT model does not carry any default signatures because it is designed to be utilized as a Keras layer. The Text Embedding Model Generator takes care of this and also integrates a text preprocessing layer for BERT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

bigquery_ml_utils-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

bigquery_ml_utils-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

bigquery_ml_utils-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

bigquery_ml_utils-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

File details

Details for the file bigquery_ml_utils-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bigquery_ml_utils-1.3.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 0f330cf16b701072ca5b0fa37847ebf95225542f5064e4ff1c63462b9b2f01a1
MD5 8bd21c6fb65a74a63f4940067453b631
BLAKE2b-256 4e1cd77a3ca33bba6f20adc33f7d03c376cf93e3b9d53532681dc157c82dc2da

See more details on using hashes here.

File details

Details for the file bigquery_ml_utils-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bigquery_ml_utils-1.3.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 27606f4d9b750758ae89719eb543ed65a3c05b0fdde4b10b5ccb7f1e15860fae
MD5 77261d6d3b2a020ad042200c2ff1c0e9
BLAKE2b-256 bfbb44059c759634d0c411d87443b75d346ba3bd1093ca1c42b4083a98aa566a

See more details on using hashes here.

File details

Details for the file bigquery_ml_utils-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bigquery_ml_utils-1.3.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 098fd79b6d28415ea88176dbaf64bc6860109e702f95795473d3fcc8718fd571
MD5 9c5b6c62be088fb5b5f060efd50da908
BLAKE2b-256 4f8bc2b1dfcb93e260f05ead09f5025b6d9b314dcd7dbcac63f91f64560967ce

See more details on using hashes here.

File details

Details for the file bigquery_ml_utils-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for bigquery_ml_utils-1.3.0-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46d3d46d0f339c0f4bb329bd4cd6929ffeb64c621477b5577caa13db98e39aaa
MD5 fb32a9d4229ba7409214a7615483baee
BLAKE2b-256 cb21a248dfb5ba79facd69a1ebd4d8c15ff8897e8c9de2ab395e91705f598204

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page