This package contains utility functions for Prefect and Snowflake
Project description
orchestration-utilities
This repository holds the utilities modules that are essential for ETL operations. This repository will be used as a package and serve the ETL flows.
This package will be used in the PREFECT flows and SNOWFLAKE as part of the ETL operations.
Installation
Install the package using PYPI
pip install orchestration-utils
Inside this package
1. aws.py
This module contains the functions that are used to interact with the AWS services.
Example: S3
2. copy_into_s3
This module contains the functions that can be used to copy the data from the Snowflake Stage(S3 Bucket) to the Snowflake Table.
This module leverages the etl_operations module to perform the Schema Drift Handeling and Query Execution.
This module works best with the Stages that are partitioned well. Example: The data in the S3 bucket is partitioned by date, year, month, etc.
This module does not perform well if the data is not partitioned well in the S3 bucket.
Example: If the data in the S3 bucket is dropped under a single folder without any partitioning, then the copy operation will take a lot of time to complete. Given the folder is heavy with files.
Class/Groups:
CopyIntoTable: This class contains the functions that are used to copy the data from the Snowflake Stage(S3 Bucket) to the Snowflake Table.copy_into_snowflake_table: This function is the main function that is used to copy the data from the Snowflake Stage(S3 Bucket) to the Snowflake Table. It accepts the parameterforcewhich is used to force the copy operation to be performed even if the data is already present in the table. The default value of theforceparameter isFalse.
3. etl_contol.py
This module contains the functions that interact with Snowflake and stores the states of the flows in the database.
- This module accepts the connection(connection_creds) paramater where the default value is
snowflake-prefect-user, pipeline name and environment name. - The pipeline name and environment name are used to store the states of the flows in the database. Example when the flow is started, completed, failed, etc.
4. etl_operations.py
This module contains the functions that are used to perform the ETL operations either in the Destination table or in the Source table.
Class/Groups:
CreateConnections: This class is used to create the connections to the databases. The connections are created using the connection credentials and warehouse name.SnowflakeDestination: This class contains all the load types and the functions that are used to load the data into the Snowflake tables.
This class accepts the connection credentials (by default the value issnowflake-prefect-user), warehouse name(by default the value isloading), database name, and environment name(by default the value isdev).DataFrameHadler: This class contains the functions that converts the dataframes columns to the relevant data types.SchemaDriftHandler: This class contains the functions that are used to handle the schema drifts in the destination table.SnowflakeSource: This class contains the functions that are used to extract the data from the Snowflake tables.
5. notifications.py
This module contains the functions that are used to send the notifications to Slack. The Webhook blocks need to be created in Prefect first to send the notifications to Slack.
Class/Groups:
SlackWebhooksNotification: This class is used to send the notifications to Slack. The Class accepts the webhook name and the message that needs to be sent to Slack.
6. queries.py
This module contains the queries that are used to perform the ETL operations in the Snowflake tables. This module is referred by the etl_control and etl_operations modules.
How to locally build package
Install the dependencies in your virtual environment.
pip install -r requirements-dev.txt
Build dist floder where .whl and .tar.gz files are created
make build
This will create the dist folder where two files are created.
orchestration_utils-0.0.0.tar.gzorchestration_utils-0.0.0-py3-none-any.whl
The .whl is the installation file that can be installed using the pip install dist/orchestration_utils-0.0.0-py3-none-any.whl command.
How to deploy
Deploy the package to the PYPI using Github Actions. There are two workflows one to deploy in dev and the other to deploy in production.
1. Dev/Manual Release to TestPyPI
- Click on Run workflow
- Select the branch that you have made the changes
- The changes will be refelcted in TestPyPI
2. Prod Release to PyPI
- Click on Run workflow
- Select the
mainbranch only - The changes will be refelcted in PyPI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file orchestration_utils-0.0.16.tar.gz.
File metadata
- Download URL: orchestration_utils-0.0.16.tar.gz
- Upload date:
- Size: 18.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb5e285125f6c104dca75b767f50bad4af7823f4965b752999a9b7747173b933
|
|
| MD5 |
5f1f0220effec4a41786dfed153b83c3
|
|
| BLAKE2b-256 |
63dea240a78d34f66a122d533e30083d27f1f1a578c739db4244713537944986
|
Provenance
The following attestation bundles were made for orchestration_utils-0.0.16.tar.gz:
Publisher:
prod-release.yml on cloudfactory/orchestration-utilities
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orchestration_utils-0.0.16.tar.gz -
Subject digest:
cb5e285125f6c104dca75b767f50bad4af7823f4965b752999a9b7747173b933 - Sigstore transparency entry: 782131124
- Sigstore integration time:
-
Permalink:
cloudfactory/orchestration-utilities@e4d640eba6af448c1ae8ababd5d0a87961861cd6 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cloudfactory
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
prod-release.yml@e4d640eba6af448c1ae8ababd5d0a87961861cd6 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file orchestration_utils-0.0.16-py3-none-any.whl.
File metadata
- Download URL: orchestration_utils-0.0.16-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc2913ff3303b49f10c89a76a16944ceaeb999364a018e8aee9dbd81dca2b996
|
|
| MD5 |
ae959b142f7a9fe95f2e907014af7396
|
|
| BLAKE2b-256 |
87cb051d455c043ac00857e47b120be6d0f9882157f2f8a900f7cbdb80a76bb9
|
Provenance
The following attestation bundles were made for orchestration_utils-0.0.16-py3-none-any.whl:
Publisher:
prod-release.yml on cloudfactory/orchestration-utilities
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
orchestration_utils-0.0.16-py3-none-any.whl -
Subject digest:
fc2913ff3303b49f10c89a76a16944ceaeb999364a018e8aee9dbd81dca2b996 - Sigstore transparency entry: 782131125
- Sigstore integration time:
-
Permalink:
cloudfactory/orchestration-utilities@e4d640eba6af448c1ae8ababd5d0a87961861cd6 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/cloudfactory
-
Access:
internal
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
prod-release.yml@e4d640eba6af448c1ae8ababd5d0a87961861cd6 -
Trigger Event:
workflow_dispatch
-
Statement type: