A Jupyter magics extension designed for seamless interaction with Snowflake
Project description
cloudy_sql
Snowflake Support
The cloudy_sql is an Ipython Magics library and pandas extension that currently supports:
- a Magics function that allows users to easily execute SQL Queries in Snowflake
- writing to an existing Snowflake table from a pandas dataframe
- creating a new Snowflake table from a pandas dataframe
How To Use cloudy_sql
Installation
pip install cloudy-sql
Configuration
Upon installation, open a jupyter notebook and run the following cell.
%load_ext cloudy_sql
After you run the cell, a configuration file will be created in your HOME directory.
The path to the configuration file is: $HOME/.cloudy_sql/configuration_profiles.yml
For Windows user: use $USERPROFILE instead of $HOME variable
The configuration file is a YAML file with the following format
profiles:
snowflake:
user: <your snowflake username>
pass: <your snowflake password>
acct: <your snowflake account>
role: <your snowflake role>
warehouse: <your snowflake warehouse>
database: <your snowflake database>
schema: <your snowflake schema>
The user, pass, acct, database, schema, values all should be filled in with your desired Snowflake credentials and connection details. The variables in this file serve as default arguments when calling a cloudy_sql method. Role and warehouse can be filled in as well, but they are optional arguments when connecting to Snowflake.
API
The intent has been to keep the API as simple as possible by minimally extending the pandas and IPython Magics APIs.
Optional Arguments
There are two methods for passing optional arguments into a method.
- The configuration file
- Directly pass in the arguments when calling the method
The variables saved in the configuration file serve as default arguments for the methods to use.
However, you tell the method to use different credentials by passing in arguments directly. The method will use the passed in arguments
instead of the default arguments saved in configuration_profiles.yml
.
For example, if I had the database
variable saved in the configuration_profiles.yml
as database_1
, but passed in database = database_2
directly into the method,
the method would use database_2
instead of database_1
.
However, if I choose to not directly pass a database
argument in, the method will use the
database_1
because it is the default. The passed in arguments take priority over the default variables saved in configuration_profiles.yml
.
IPython Magics
%%sql_to_snowflake
IPython Cell Magic to seamlessly connect to Snowflake and run a query in Snowflake and return a pandas DataFrame as the result.
%%sql_to_snowflake [<destination_var>] [--username <snowflake_username>]
[--password <snowflake_password>] [--account <snowflake_account>]
[--role <snowflake_role>] [--warehouse <snowflake_warehouse>]
<SQL query>
Parameters
* <destination_var> (Optional [IPython line argument]):
Variable to store the query results. If none is given, the magic will return
the first 10 rows of the pandas DataFrame if applicable.
* --params <params> (Optional [IPython line argument]):
Parameters to be used in the SQL Query. Params must be passed in as a
dictionary string in the format {"param_name": "param_value"} or reference a
dictionary string defined in a previous cell. The use of the parameter in the
query should be indicated with {{param_name}}.
* --username <username> (Optional [IPython line argument]):
If provided, the called method will connect to Snowflake with this username
instead of the default in the configuration file.
* --password <password> (Optional [IPython line argument]):
If provided, the called method will connect to Snowflake with this password
instead of the default in the configuration file.
* --account <account> (Optional [IPython line argument]):
If provided, the called method will connect to Snowflake with this account
instead of the default in the configuration file.
* --role <role> (Optional [IPython line argument]):
If provided, the called method will connect to Snowflake with this role
instead of the default in the configuration file.
* --warehouse <warehouse> (Optional [IPython line argument]):
If provided, the called method will use this warehouse instead of the
default in the configuration file.
write_snowflake
pd.DataFrame.cloudy_sql.write_snowflake(table: str,
database: str = None,
schema: str = None,
overwrite: bool = False,
username: str = None,
password: str = None,
account: str = None,
role: str = None,
warehouse: str = None
)
This method writes to a Snowflake table and informs you on success. This method works when writing to either an existing Snowflake table or a previously non-existing Snowflake table.
If the table that you provide does not exist, this method creates a new Snowflake table and writes to it. If the table already exists, the DataFrame data is
appended to the existing table by default. If you would like to replace the table with the pandas DataFrame
set overwrite = True
when calling the method. You can configure your database
and schema
default values in the configuration file. You can also pass them in directly when calling the method.
The passed in values are used instead of the defaults in the configuration file.
The goal of this method is to be used in tandem with %%sql_to_snowflake
. You use the magic function to run your SQL query
that returns a pandas DataFrame. Then, you can transform the DataFrame and write the DataFrame to a Snowflake using this method.
Examples
Using %%sql_to_snowflake
magic and write_snowflake
method
In [1]: %load_ext cloudy_sql
In [2]: %%sql_to_snowflake df
SELECT * from db.schema.table
Query successfully ran and results were stored to the 'df' destination variable.
In [3]: df.head()
Out[3]:
CUSTOMER_ID FIRST_NAME LAST_NAME FIRST_ORDER_DATE MOST_RECENT_ORDER_DATE NUMBER_OF_ORDERS
0 1 Michael P. 2018-01-01 2018-02-10 2
1 2 Shawn M. 2018-01-11 2018-01-11 1
2 3 Kathleen P. 2018-01-02 2018-03-11 3
3 4 Jimmy C. None None 0
4 5 Katherine R. None None 0
In [4]: df.cloudy_sql.write_snowflake(table="test_cloudy_sql")
Successfully wrote to the test_cloudy_sql Snowflake table
In [5]: %%sql_to_snowflake
drop table if exists db.schema.test_cloudy_sql
Successfully ran SQL Query in Snowflake
In [6]: %close_connection
The above example runs a SQL query with %%sql_to_snowflake
and saves the results as a pandas DataFrame by passing
in the destination variable df
. The example then shows how to easily write that df
to a Snowflake table In [4]
. Also, the created table is dropped In [5]
.
The connection is then closed by calling the %close_connection
magic In [6]
.
Using %%sql_to_snowflake
magic with the --params
inline argument
In [1]: %load_ext cloudy_sql
In [2]: parameters = {'firstname': 'Michael', 'orders': '2'}
In [3]: %%sql_to_snowflake --params $parameters
SELECT * from db.schema.table
WHERE FIRST_NAME = {{firstname}} and NUMBER_OF_ORDERS = {{orders}}
Out[3]:
CUSTOMER_ID FIRST_NAME LAST_NAME FIRST_ORDER_DATE MOST_RECENT_ORDER_DATE NUMBER_OF_ORDERS
0 1 Michael P. 2018-01-01 2018-02-10 2
The above example runs a SQL query with passed-in variables. The variables are used directly in the SQL query by placing each one inside {{ }}
. A dictionary string parameters
is passed in when the
magic is called by including a --params
inline argument and placing a $
to reference the dictionary string creating in the previous cell. There is no specified destination variable
. Therefore, the magic prints out the resulting pandas dataframe.
Using %%sql_to_snowflake
magic with the --params
inline argument
In [1]: %load_ext cloudy_sql
In [2]: %%sql_to_snowflake --params {'firstname': 'Michael', 'orders': '2'}
SELECT * from db.schema.table
WHERE FIRST_NAME = {{firstname}} and NUMBER_OF_ORDERS = {{orders}}
Out[2]:
CUSTOMER_ID FIRST_NAME LAST_NAME FIRST_ORDER_DATE MOST_RECENT_ORDER_DATE NUMBER_OF_ORDERS
0 1 Michael P. 2018-01-01 2018-02-10 2
This example directly passes in the dictionary string when calling the --params
inline argument.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file cloudy_sql-0.0.0.4.tar.gz
.
File metadata
- Download URL: cloudy_sql-0.0.0.4.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.23.0 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31ff045f454007d5f63df214a264a54ef816edc6aa5588475ece64ab4b690082 |
|
MD5 | 8f1b7b81270bf6f745dfb8b4a5d0dab8 |
|
BLAKE2b-256 | 3ff324c4de71d80ca1dfdeb3ce04fb7fddac92fe643df110df0915f0089fa84f |