A fluent API for Google Cloud Python Client

Project description

Google Cloud Fluent Client


This is a lightweight wrapper on top of Google Cloud Platform Python SDK client library. It provides a fluent-style to call the methods. The motivation is, too many parameters for GCP Storage and BigQuery library, and most of them are ok to be set as default values.

This wrapper is suitable for Data Engineers to quickly create simple data pipeline based on GCP BigQuery and Storage, here are two examples.

Build Data Pipeline on BigQuery

You (A Data Engineer) are asked to,

  • load multiple json files from your local drive to GCS

  • import those files to a BigQuery staging table

  • run another query based on the staging table by joining existing tables, and store the result to another table

To accomplish the task, here are the source code,

from gfluent import BQ, GCS

project_id = "here-is-you-project-id"
bucket_name = "my-bucket"
dataset = "sales"
table_name = "products"
prefix = "import"
local_path = "/user/tom/products/" # there are many *.json files in this directory

# uplaod files to GCS bucket
    .local(path=local_path, suffix=".json" )

# if you need to create the dataset
BQ(project_id).create_dataset(dataset, location="US")

# load data to BigQuery table

uri = f"gs://{bucket_name}/{prefix}/*.json"
number_of_rows = (
    .mode("WRITE_APPEND")               # don't have to, default mode
    .create_mode("CREATE_IF_NEEDED")    # don't have to, default mode
    .format("NEWLINE_DELIMITED_JSON")   # don't have to, default format

print(f"{number_of_rows} rows are loaded")

# run a query

final_table = "sales_summary"

sql = """
    select t1.col1, t2.col2, t2.col3
        sales.products t1
        other.category t2
    ON  t1.prod_id = t2.prod_id

number_of_rows = (
    .create_mode("CREATE_NEVER")    # have to, don't want to create new table

print(f"{number_of_rows} rows are appended")

# now let's query the new table

rows = (
    .sql(f"select col1, col2 from {dataset}.{final_table} limit 10")

for row in rows:
    print(row.col1, row.col2)

Loading data from Spreadsheet to BigQuery

import os
from gfluent import Sheet, BQ

project_id = 'your project id'
sheet_id = 'your Google sheet id`

# assume the data is on the sheet `data` and range is `A1:B4`
sheet = Sheet(

bq = BQ(project=project_id).table("target_dataset.table")"EU")


Here is the document, and please refer to the test cases to see more real examples.


Install from PyPi,

pip install -U gfluent

Or build and install from source code,

pip install -r requirements-dev.txt
poetry build
pip install dist/gfluent-<versoin>.tar.gz


Any kinds of contribution is welcome, including report bugs, add feature or enhuance document. Please be noted, the Integration Test is using a real GCP project, and you may not have the permission to set up the test data.

