Yet Another WorkLoad - manage scheduled queries [currently] on BigQuery
Project description
YAWL - Yet Another Workload
1.0. Intro
YAWL - Yet Another WorkLoad is a tool to help you organize better [at least for now] your queries on BigQuery [only]. If you're working with scheduled queries, this tool is for you.
It intends to manage your repo organization, and to let you automate the process of updating your queries on BigQuery Data Transfer service.
2.0. Installing YAWL
You just have to do a pip install yawl
and that's it! YAWL is published on PyPI.
3.0. Using YAWL
Let's say that you have the scheduled query test_query101
, that runs on such schedule as every mon,wed 09:00
, and that defines a table such as myproject.mydataset.revenue_per_users
and that's represented by a SQL statement such as:
SELECT username, SUM(revenue) AS revenue
FROM some_project.some_dataset.some_table
GROUP BY username
Then, things are going nice, but then you find that you have to add the user's e-mail also over the same query in order to generate the results. Now, you'd have a query like this:
SELECT username, user_email, SUM(revenue) AS revenue
FROM some_project.some_dataset.some_table
GROUP BY username, user_email
If you don't have anything connected to your data transfer service, you'll need to:
- Manually enter uder the scheduled query on the UI in order to change how it should behave;
- Try to deploy again programatically the
test_query101
just to find out that BigQuery will now have twotest_query101
Other possible problem is that you can't have a nice CI/CD process with this, in order to allow a good practice with other teammates reviewing your code, and automaticaly deploying it when approved.
Now, in order to use YAWL, you have two things to consider:
- Creating the steps
step_1 = BigQueryWorkflowStep(
sql="./sql_files/example.sql",
dest_table="google_cloud_project_id.transfer_test.table_1",
squeduled_query_name="test_query101",
schedule="every mon,wed 09:00",
)
step_2 = BigQueryWorkflowStep(
sql="./sql_files/example.sql",
dest_table="google_cloud_project_id.transfer_test.table_2",
squeduled_query_name="test_query_102",
schedule="every tue,thu 10:00",
)
- Creating the queue
with queue() as q:
q.add(step_1).add(step_2).process()
And that's it! The process method will be in charge of pushing your queries directly into BigQuery Data Transfer Service. You may note that the sql
argument can have either a SQL statement, or a path to a SQL file.
And other cool thing is that if you're changing something over a SQL file, let's say, to update how a query should behave, and you just want to maintain the same scheduled query display name, well, you can! This way you can let your git maintain your queries history, this way if anything goes wrong you'll be able to rollback to an older commit.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file yawl-0.1.1.tar.gz
.
File metadata
- Download URL: yawl-0.1.1.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.8.12 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | abd9055dbdfb804051982aa37d8bd632f205bb7a87c3a534b062595d81c62653 |
|
MD5 | e0c9faa2f85d18210464ad4321ce9a14 |
|
BLAKE2b-256 | 1cacc738157ac946cdfe19a64ca3e56ead9366dc2dc10911feb570f7dd00d13b |
File details
Details for the file yawl-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: yawl-0.1.1-py3-none-any.whl
- Upload date:
- Size: 10.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.7 CPython/3.8.12 Linux/5.11.0-1028-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31f30bbe84458bcb1247c6aa10bfdb7943e1ff8a3f15af9aeabfe3e8b32d07f1 |
|
MD5 | 504b328c723fe475a63e911560b2767d |
|
BLAKE2b-256 | 6ad513d2e2be7da3c1ad1637e723786e706528a85c162573a77b84adf360e691 |