Kedro plugin with Snowflake / Snowpark support
Project description
Kedro Snowflake Pipelines plugin
We help companies turn their data into assets
About
This plugin allows to run full Kedro pipelines in Snowflake. Right now it supports
- Kedro starter, to get you up to speed fast
- automatically creating Snowflake Stored Procedures from Kedro nodes (using Snowpark SDK)
- translating Kedro pipeline into Snowflake tasks graph
- running Kedro pipeline fully within Snowflake, without external system
- using Kedro's official
SnowparkTableDataSet
- automatically storing intermediate data as Transient Tables (if Snowpark's DataFrames are used)
- (New!) MLflow integration with Snowflake with example usage in Snowflights Kedro starter
Documentation
For detailed documentation refer to https://kedro-snowflake.readthedocs.io/
Usage
With starter
-
Install the plugin
pip install "kedro-snowflake>=0.1.0"
-
Create new project with our Kedro starter ❄️ Snowflights 🚀:
kedro new --starter=snowflights --checkout=master
And answer the interactive prompts ⬇️ (click to expand)
Project Name ============ Please enter a human readable name for your new project. Spaces, hyphens, and underscores are allowed. [Snowflights]: Snowflake Account ================= Please enter the name of your Snowflake account. This is the part of the URL before .snowflakecomputing.com []: abc-123 Snowflake User ============== Please enter the name of your Snowflake user. []: user2137 Snowflake Warehouse =================== Please enter the name of your Snowflake warehouse. []: compute-wh Snowflake Database ================== Please enter the name of your Snowflake database. [DEMO]: Snowflake Schema ================ Please enter the name of your Snowflake schema. [DEMO]: Snowflake Password Environment Variable ======================================= Please enter the name of the environment variable that contains your Snowflake password. Alternatively, you can re-configure the plugin later to use Kedros credentials.yml [SNOWFLAKE_PASSWORD]: Pipeline Name Used As A Snowflake Task Prefix ============================================= [default]: Enable Mlflow Integration (See Documentation For The Configuration Instructions) ================================================================================ [False]: The project name 'Snowflights' has been applied to: - The project title in /tmp/snowflights/README.md - The folder created for your project in /tmp/snowflights - The project's python package in /tmp/snowflights/src/snowflights
-
Run the project
cd snowflights kedro snowflake run --wait-for-completion
In existing Kedro project
- Install the plugin
pip install "kedro-snowflake>=0.1.0"
- Initialize the plugin
kedro snowflake init <ACCOUNT> <USER> <PASSWORD_FROM_ENV> <DATABASE> <SCHEMA> <WAREHOUSE>
- Run the project
kedro snowflake run --wait-for-completion
Kedro pipeline in Snowflake Tasks
Execution:
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kedro_snowflake-0.2.1.tar.gz
(4.9 MB
view hashes)
Built Distribution
Close
Hashes for kedro_snowflake-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ea42950503a487fbdc070578b5f333f42dccf4ddaf2016ced676c6224af3fca |
|
MD5 | 8eb969ceb2dfee260ccdafa54f34b00b |
|
BLAKE2b-256 | 0d8bf5d302604a05e85d051d74ab3a759102faa10d59c0bfd772aafade9e8e9b |