Skip to main content

Yet Another WorkLoad - manage scheduled queries [currently] on BigQuery

Project description

YAWL - Yet Another Workload

Checks Build

1.0. Intro

YAWL - Yet Another WorkLoad is a tool to help you organize better [at least for now] your queries on BigQuery [only]. If you're working with scheduled queries, this tool is for you.

It intends to manage your repo organization, and to let you automate the process of updating your queries on BigQuery Data Transfer service.

2.0. Installing YAWL

You just have to do a pip install yawl and that's it! YAWL is published on PyPI.

3.0. Using YAWL

Let's say that you have the scheduled query test_query101, that runs on such schedule as every mon,wed 09:00, and that defines a table such as myproject.mydataset.revenue_per_users and that's represented by a SQL statement such as:

SELECT username, SUM(revenue) AS revenue
FROM some_project.some_dataset.some_table
GROUP BY username

Then, things are going nice, but then you find that you have to add the user's e-mail also over the same query in order to generate the results. Now, you'd have a query like this:

SELECT username, user_email, SUM(revenue) AS revenue
FROM some_project.some_dataset.some_table
GROUP BY username, user_email

If you don't have anything connected to your data transfer service, you'll need to:

  1. Manually enter uder the scheduled query on the UI in order to change how it should behave;
  2. Try to deploy again programatically the test_query101 just to find out that BigQuery will now have two test_query101

Other possible problem is that you can't have a nice CI/CD process with this, in order to allow a good practice with other teammates reviewing your code, and automaticaly deploying it when approved.

Now, in order to use YAWL, you have two things to consider:

  1. Creating the steps
step_1 = BigQueryWorkflowStep(
    sql="./sql_files/example.sql",
    dest_table="google_cloud_project_id.transfer_test.table_1",
    squeduled_query_name="test_query101",
    schedule="every mon,wed 09:00",
)
step_2 = BigQueryWorkflowStep(
    sql="./sql_files/example.sql",
    dest_table="google_cloud_project_id.transfer_test.table_2",
    squeduled_query_name="test_query_102",
    schedule="every tue,thu 10:00",
)
  1. Creating the queue
with queue() as q:
        q.add(step_1).add(step_2).process()

And that's it! The process method will be in charge of pushing your queries directly into BigQuery Data Transfer Service. You may note that the sql argument can have either a SQL statement, or a path to a SQL file.

And other cool thing is that if you're changing something over a SQL file, let's say, to update how a query should behave, and you just want to maintain the same scheduled query display name, well, you can! This way you can let your git maintain your queries history, this way if anything goes wrong you'll be able to rollback to an older commit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yawl-0.1.1.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

yawl-0.1.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file yawl-0.1.1.tar.gz.

File metadata

  • Download URL: yawl-0.1.1.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for yawl-0.1.1.tar.gz
Algorithm Hash digest
SHA256 abd9055dbdfb804051982aa37d8bd632f205bb7a87c3a534b062595d81c62653
MD5 e0c9faa2f85d18210464ad4321ce9a14
BLAKE2b-256 1cacc738157ac946cdfe19a64ca3e56ead9366dc2dc10911feb570f7dd00d13b

See more details on using hashes here.

File details

Details for the file yawl-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: yawl-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.7 CPython/3.8.12 Linux/5.11.0-1028-azure

File hashes

Hashes for yawl-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31f30bbe84458bcb1247c6aa10bfdb7943e1ff8a3f15af9aeabfe3e8b32d07f1
MD5 504b328c723fe475a63e911560b2767d
BLAKE2b-256 6ad513d2e2be7da3c1ad1637e723786e706528a85c162573a77b84adf360e691

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page