Skip to main content

A fluent API for Google Cloud Python Client

Project description

Google Cloud Fluent Client

UT & SIT

This is a lightweight wrapper on top of Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods. The motivation is, too many parameters for GCP Storage and BigQuery library, and most of them are ok to be set as default values.

This wrapper is suitable for Data Engineers to quickly create simple data pipeline based on GCP BigQuery and Storage, here are two examples.

Build Data Pipeline on BigQuery

You (A Data Engineer) are asked to,

  • load multiple json files from your local drive to GCS

  • import those files to a BigQuery staging table

  • run another query based on the staging table by joining existing tables, and store the result to another table

To accomplish the task, here are the source code,

from gfluent import BQ, GCS

project_id = "here-is-you-project-id"
bucket_name = "my-bucket"
dataset = "sales"
table_name = "products"
prefix = "import"
local_path = "/user/tom/products/" # there are many *.json files in this directory

# uplaod files to GCS bucket
(
    GCS(project_id)
    .local(path=local_path, suffix=".json" )
    .bucket(bucket_name)
    .prefix(prefix)
    .upload()
)

# if you need to create the dataset
BQ(project_id).create_dataset(dataset, location="US")

# load data to BigQuery table

uri = f"gs://{bucket_name}/{prefix}/*.json"
number_of_rows = (
    BQ(project_id)
    .table(f"{dataset}.{table_name}")
    .mode("WRITE_APPEND")               # don't have to, default mode
    .create_mode("CREATE_IF_NEEDED")    # don't have to, default mode
    .format("NEWLINE_DELIMITED_JSON")   # don't have to, default format
    .gcs(uri).load(location="US")
)

print(f"{number_of_rows} rows are loaded")


# run a query

final_table = "sales_summary"

sql = """
    select t1.col1, t2.col2, t2.col3
    FROM
        sales.products t1
    JOIN
        other.category t2
    ON  t1.prod_id = t2.prod_id
"""

number_of_rows = (
    BQ(product_id)
    .table(f"{dataset}.{final_table}")
    .sql(sql)
    .create_mode("CREATE_NEVER")    # have to, don't want to create new table
    .query()
)

print(f"{number_of_rows} rows are appended")


# now let's query the new table

rows = (
    BQ(product_id)
    .sql(f"select col1, col2 from {dataset}.{final_table} limit 10")
    .query()
)

for row in rows:
    print(row.col1, row.col2)

Loading data from Spreadsheet to BigQuery

import os
from gfluent import Sheet, BQ

project_id = 'your project id'
sheet_id = 'your Google sheet id`

# assume the data is on the sheet `data` and range is `A1:B4`
sheet = Sheet(
    os.getenv("GOOGLE_APPLICATION_CREDENTIALS")
).sheet_id(sheet_id).worksheet("data!A1:B4")

bq = BQ(project=project_id).table("target_dataset.table")

sheet.bq(bq).load(location="EU")

Documents

Here is the document, and please refer to the test cases to see more real examples.

Installation

Install from PyPi,

pip install -U gfluent

Or build and install from source code,

pip install -r requirements-dev.txt
poetry build
pip install dist/gfluent-<versoin>.tar.gz

Contribution

Any kinds of contribution is welcome, including report bugs, add feature or enhuance document. Please be noted, the Integration Test is using a real GCP project, and you may not have the permission to set up the test data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gfluent-0.1.16.tar.gz (11.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gfluent-0.1.16-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file gfluent-0.1.16.tar.gz.

File metadata

  • Download URL: gfluent-0.1.16.tar.gz
  • Upload date:
  • Size: 11.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for gfluent-0.1.16.tar.gz
Algorithm Hash digest
SHA256 0580471cbea69f657a558fd1561076e5a6a2ab2f601f8eae1224b6cb1b43d84a
MD5 2e90a2c42773440390a174369ce68506
BLAKE2b-256 38caf715f3fbd80501181517ec4e06f3e2b8c6208360af158ad73e5a35acb8d7

See more details on using hashes here.

File details

Details for the file gfluent-0.1.16-py3-none-any.whl.

File metadata

  • Download URL: gfluent-0.1.16-py3-none-any.whl
  • Upload date:
  • Size: 11.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12

File hashes

Hashes for gfluent-0.1.16-py3-none-any.whl
Algorithm Hash digest
SHA256 aba16522f28ea8f79ad575871cd7d268f24434164844d8fc5adc0d9e79821f67
MD5 d98212a393e7fa21d9a5e25aa6d10f64
BLAKE2b-256 8e347e01833362968f140209931e1387c58eb7cefead4aef1da4a5ebbdbfd0ff

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page