A set of helpers to extend snowpark functionality

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

snowpark-extensions-py

Snowpark by itself is a powerful library, but still some utility functions can always help.

Currently provided extensions:

SessionBuilder extensions

Name	Description
SessionBuilder.from_snowsql	can read the information from the snowsql config file by default at ~/snowsql/config or at a given location
SessionBuilder.env	reads settings from SNOW_xxx or SNOWSQL_xxx variables
SessionBuilder.appName	Sets a query tag with the given appName
SessionBuilder.append_tag	Appends a new tag to the existing query tag

You can the create your session like:

from snowflake.snowpark import Session
import snowpark_extensions
new_session = Session.builder.from_snowsql().appName("app1").create()

from snowflake.snowpark import Session
import snowpark_extensions
new_session = Session.builder.env().appName("app1").create()

The appName can use to setup a query_tag like APPNAME=tag;execution_id=guid which can then be used to track job actions with a query like

You can then use a query like: To see all executions from an app or

select *
from table(information_schema.query_history())
whery query_tag like '%APPNAME=tag%'
order by start_time desc;

To see the executions for a particular execution:

select *
from table(information_schema.query_history())
whery query_tag like '%APPNAME=tag;execution_id=guid%'
order by start_time desc;

Column Extensions

Name	Description
Column.getItem	An expression that gets an item at position ordinal out of a list, or gets an item by key out of a dict.

DataFrame Extensions

Name	Description
DataFrame.dtypes	returns the list of datatypes in the DataFrame
DataFrame.map	provides an equivalent for the map function for example `df.map(func,input_types=[StringType(),StringType()],output_types=[StringType(),IntegerType()],to_row=True)`
DataFrame.simple_map	if a simple lambda like `lambda x: x.col1 + x.col2` is used this functions can be used like `df.simple_map(lambda x: x.col1 + x.col2)`
DataFrame.groupby.applyInPandas	Maps each group of the current DataFrame using a pandas udf and returns the result as a DataFrame.
DataFrame.replace	extends replace to allow using a regex

Examples

map and simple_map

from snowflake.snowpark import Session
from snowflake.snowpark.types import *
import snowpark_extensions

session = Session.builder.from_snowsql().appName("app1").getOrCreate()
  
data = [('James','Smith','M',30),('Anna','Rose','F',41),('Robert','Williams','M',62)]
columns = ["firstname","lastname","gender","salary"]
df = session.createDataFrame(data=data, schema = columns)
df.show()


#
#--------------------------------------------------
#|"FIRSTNAME"  |"LASTNAME"  |"GENDER"  |"SALARY"  |
#--------------------------------------------------
#|James        |Smith       |M         |30        |
#|Anna         |Rose        |F         |41        |
#|Robert       |Williams    |M         |62        |
#--------------------------------------------------




# using map with a lamda, the to_row indicates that the code will pass a row as x to the lambda
# if you have a lambda like lambda x,y,z you can use to_row=False
df2=df.map(lambda x: 
        (x[0]+","+x[1],x[2],x[3]*2),
        output_types=[StringType(),StringType(),IntegerType()],to_row=True)
df2.show()

#
#-----------------------------------
#|"C_1"            |"C_2"  |"C_3"  |
#-----------------------------------
#|James,Smith      |M      |60     |
#|Anna,Rose        |F      |82     |
#|Robert,Williams  |M      |124    |
#-----------------------------------
#

# for simple lambda
# simple map will just pass the same dataframe to the function
# this approach is faster
df2 = df.simple_map(lambda x: (x[0]+","+x[1],x[2],x[3]*2))
df2.toDF(["name","gender","new_salary"]).show()

#---------------------------------------------
#|"NAME"           |"GENDER"  |"NEW_SALARY"  |
#---------------------------------------------
#|James,Smith      |M         |60            |
#|Anna,Rose        |F         |82            |
#|Robert,Williams  |M         |124           |
#---------------------------------------------
#

replace with support for regex

df = session.createDataFrame([('bat',1,'abc'),('foo',2,'bar'),('bait',3,'xyz')],['A','C','B'])
# already supported replace
df.replace(to_replace=1, value=100).show()
# replace with regex
df.replace(to_replace=r'^ba.$', value='new',regex=True).show()

Functions Extensions

Name	Description
functions.array_sort	sorts the input array in ascending order or descending order. The elements of the input array must be orderable. Null elements will be placed at the end of the returned array.
functions.unix_timestamp	returns the UNIX timestamp of current time.
functions.from_unixtimestamp	can be used to convert UNIX time to Snowflake timestamp
functions.format_number	formats numbers using the specified number of decimal places
functions.reverse	returns a reversed string
functions.explode	returns a new row for each element in the given array
functions.date_add	returns the date that is n days days after
functions.date_sub	returns the date that is n days before
functions.regexp_extract	Extract a specific group matched by a regex, from the specified string column.

Examples:

array_sort

from snowflake.snowpark import Session, DataFrame
from snowflake.snowpark.functions import col, lit
from snowflake.snowpark import functions as F
import snowpark_extensions

session = Session.builder.from_snowsql().getOrCreate()
df = session.createDataFrame([([2, 1, None, 3],),([1],),([],)], ['data'])
df.select(F.array_sort(df.data)).show()

-------------------------------------------
|"ARRAY_SORT(""DATA"", TRUE :: BOOLEAN)"  |
-------------------------------------------
|[                                        |
|  2,                                     |
|  1,                                     |
|  3,                                     |
|  null                                   |
|]                                        |
|[]                                       |
|[                                        |
|  1                                      |
|]                                        |
-------------------------------------------

df.select(F.array_sort(df.data, asc=False)).show()

--------------------------------------------
|"ARRAY_SORT(""DATA"", FALSE :: BOOLEAN)"  |
--------------------------------------------
|[                                         |
|  1                                       |
|]                                         |
|[]                                        |
|[                                         |
|  null,                                   |
|  2,                                      |
|  1,                                      |
|  3                                       |
|]                                         |
--------------------------------------------

regexp_extract

session = Session.builder.from_snowsql().create()

df = session.createDataFrame([('100-200',)], ['str']) res = df.select(F.regexp_extract('str',r'(\d+)-(\d+)',1).alias('d')).collect() print(str(res))

[Row(D='1')]

df = session.createDataFrame([['id_20_30', 10], ['id_40_50', 30]], ['id', 'age']) df.show()

--------------------

|"ID" |"AGE" |

--------------------

|id_20_30 |10 |

|id_40_50 |30 |

--------------------

df.select(F.regexp_extract('id', r'(\d+)', 1)).show()

------------------------------------------------------

|"COALESCE(REGEXP_SUBSTR(""ID"", '(\D+)', 1, 1,... |

------------------------------------------------------

|20 |

|40 |

------------------------------------------------------

df.select(F.regexp_extract('id', r'(\d+)_(\d+)', 2)).show()

------------------------------------------------------

|"COALESCE(REGEXP_SUBSTR(""ID"", '(\D+)_(\D+)'... |

------------------------------------------------------

|30 |

|50 |

------------------------------------------------------

utilities

Name	Description
utils.map_to_python_type	maps from DataType to python type

Usage:

just import it at the top of your file and it will automatically extend your snowpark package

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.39

Apr 28, 2024

0.0.38

Apr 5, 2024

0.0.37

Jan 14, 2024

0.0.36

Dec 15, 2023

0.0.35

Oct 18, 2023

0.0.34

Oct 17, 2023

0.0.33

Oct 17, 2023

0.0.32

Jul 19, 2023

0.0.31

Jul 15, 2023

0.0.30

Jul 5, 2023

0.0.26

Feb 9, 2023

0.0.25

Feb 6, 2023

0.0.24

Feb 5, 2023

0.0.23

Feb 2, 2023

0.0.22

Feb 2, 2023

0.0.21

Jan 30, 2023

0.0.20

Jan 27, 2023

0.0.19

Jan 26, 2023

0.0.17

Jan 26, 2023

0.0.16

Jan 25, 2023

0.0.15

Jan 4, 2023

0.0.14

Dec 27, 2022

0.0.12

Dec 24, 2022

0.0.11

Dec 22, 2022

0.0.10

Dec 19, 2022

0.0.9

Dec 14, 2022

0.0.8

Dec 14, 2022

0.0.6

Dec 10, 2022

0.0.5

Dec 8, 2022

This version

0.0.4

Dec 1, 2022

0.0.2

Nov 29, 2022

0.0.1

Nov 18, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowpark_extensions-0.0.4.tar.gz (11.1 kB view hashes)

Uploaded Dec 1, 2022 Source

Built Distribution

snowpark_extensions-0.0.4-py3-none-any.whl (10.5 kB view hashes)

Uploaded Dec 1, 2022 Python 3

Hashes for snowpark_extensions-0.0.4.tar.gz

Hashes for snowpark_extensions-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`dd340e8d55e7f8fa0313ee4873ff0b7814cd678135a6ec979e9df53891d97534`
MD5	`55eea51d8a6133839799fd062ef29c7f`
BLAKE2b-256	`70f76a73fa098ad0ca2c125fe25a567dd86dd7275aa064032444c80fa8e401fa`

Hashes for snowpark_extensions-0.0.4-py3-none-any.whl

Hashes for snowpark_extensions-0.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f97f1e922d90086e8753be63b8a776f03aa5f2a016eb2c009f0af9474258252c`
MD5	`6d477d2270c59827d455a3a128ce417f`
BLAKE2b-256	`c8279001ee65d02b8d39e9bac5d47847aac851977f7cc7dd0bc984e462f27c13`

snowpark-extensions 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Project description

snowpark-extensions-py

Currently provided extensions:

SessionBuilder extensions

Column Extensions

DataFrame Extensions

Examples

map and simple_map

replace with support for regex

Functions Extensions

Examples:

array_sort

regexp_extract

[Row(D='1')]

--------------------

|"ID" |"AGE" |

--------------------

|id_20_30 |10 |

|id_40_50 |30 |

--------------------

------------------------------------------------------

|"COALESCE(REGEXP_SUBSTR(""ID"", '(\D+)', 1, 1,... |

------------------------------------------------------

|20 |

|40 |

------------------------------------------------------

------------------------------------------------------

|"COALESCE(REGEXP_SUBSTR(""ID"", '(\D+)_(\D+)'... |

------------------------------------------------------

|30 |

|50 |

------------------------------------------------------

utilities

Usage:

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution