Skip to main content

Snowflake Snowpark for Python

Project description

Snowflake Snowpark Python and Snowpark pandas APIs

Build and Test codecov PyPi License Apache-2.0 Codestyle Black

The Snowpark library provides intuitive APIs for querying and processing data in a data pipeline. Using this library, you can build applications that process data in Snowflake without having to move data to the system where your application code runs.

Source code | Snowpark Python developer guide | Snowpark Python API reference | Snowpark pandas developer guide | Snowpark pandas API reference | Product documentation | Samples

Getting started

Have your Snowflake account ready

If you don't have a Snowflake account yet, you can sign up for a 30-day free trial account.

Create a Python virtual environment

You can use miniconda, anaconda, or virtualenv to create a Python 3.9, 3.10, 3.11, 3.12 or 3.13 virtual environment.

For Snowpark pandas, only Python 3.9, 3.10, or 3.11 is supported.

To have the best experience when using it with UDFs, creating a local conda environment with the Snowflake channel is recommended.

Install the library to the Python virtual environment

pip install snowflake-snowpark-python

To use the Snowpark pandas API, you can optionally install the following, which installs modin in the same environment. The Snowpark pandas API provides a familiar interface for pandas users to query and process data directly in Snowflake.

pip install "snowflake-snowpark-python[modin]"

Create a session and use the Snowpark Python API

from snowflake.snowpark import Session

connection_parameters = {
  "account": "<your snowflake account>",
  "user": "<your snowflake user>",
  "password": "<your snowflake password>",
  "role": "<snowflake user role>",
  "warehouse": "<snowflake warehouse>",
  "database": "<snowflake database>",
  "schema": "<snowflake schema>"
}

session = Session.builder.configs(connection_parameters).create()
# Create a Snowpark dataframe from input data
df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"]) 
df = df.filter(df.a > 1)
result = df.collect()
df.show()

# -------------
# |"A"  |"B"  |
# -------------
# |3    |4    |
# -------------

Create a session and use the Snowpark pandas API

import modin.pandas as pd
import snowflake.snowpark.modin.plugin
from snowflake.snowpark import Session

CONNECTION_PARAMETERS = {
    'account': '<myaccount>',
    'user': '<myuser>',
    'password': '<mypassword>',
    'role': '<myrole>',
    'database': '<mydatabase>',
    'schema': '<myschema>',
    'warehouse': '<mywarehouse>',
}
session = Session.builder.configs(CONNECTION_PARAMETERS).create()

# Create a Snowpark pandas dataframe from input data
df = pd.DataFrame([['a', 2.0, 1],['b', 4.0, 2],['c', 6.0, None]], columns=["COL_STR", "COL_FLOAT", "COL_INT"])
df
#   COL_STR  COL_FLOAT  COL_INT
# 0       a        2.0      1.0
# 1       b        4.0      2.0
# 2       c        6.0      NaN

df.shape
# (3, 3)

df.head(2)
#   COL_STR  COL_FLOAT  COL_INT
# 0       a        2.0        1
# 1       b        4.0        2

df.dropna(subset=["COL_INT"], inplace=True)

df
#   COL_STR  COL_FLOAT  COL_INT
# 0       a        2.0        1
# 1       b        4.0        2

df.shape
# (2, 3)

df.head(2)
#   COL_STR  COL_FLOAT  COL_INT
# 0       a        2.0        1
# 1       b        4.0        2

# Save the result back to Snowflake with a row_pos column.
df.reset_index(drop=True).to_snowflake('pandas_test2', index=True, index_label=['row_pos'])

Samples

The Snowpark Python developer guide, Snowpark Python API references, Snowpark pandas developer guide, and Snowpark pandas api references have basic sample code. Snowflake-Labs has more curated demos.

Logging

Configure logging level for snowflake.snowpark for Snowpark Python API logs. Snowpark uses the Snowflake Python Connector. So you may also want to configure the logging level for snowflake.connector when the error is in the Python Connector. For instance,

import logging
for logger_name in ('snowflake.snowpark', 'snowflake.connector'):
    logger = logging.getLogger(logger_name)
    logger.setLevel(logging.DEBUG)
    ch = logging.StreamHandler()
    ch.setLevel(logging.DEBUG)
    ch.setFormatter(logging.Formatter('%(asctime)s - %(threadName)s %(filename)s:%(lineno)d - %(funcName)s() - %(levelname)s - %(message)s'))
    logger.addHandler(ch)

Reading and writing to pandas DataFrame

Snowpark Python API supports reading from and writing to a pandas DataFrame via the to_pandas and write_pandas commands.

To use these operations, ensure that pandas is installed in the same environment. You can install pandas alongside Snowpark Python by executing the following command:

pip install "snowflake-snowpark-python[pandas]"

Once pandas is installed, you can convert between a Snowpark DataFrame and pandas DataFrame as follows:

df = session.create_dataframe([[1, 2], [3, 4]], schema=["a", "b"])
# Convert Snowpark DataFrame to pandas DataFrame
pandas_df = df.to_pandas() 
# Write pandas DataFrame to a Snowflake table and return Snowpark DataFrame
snowpark_df = session.write_pandas(pandas_df, "new_table", auto_create_table=True)

Snowpark pandas API also supports writing to pandas:

import modin.pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns=["a", "b"])
# Convert Snowpark pandas DataFrame to pandas DataFrame
pandas_df = df.to_pandas() 

Note that the above Snowpark pandas commands will work if Snowpark is installed with the [modin] option, the additional [pandas] installation is not required.

Verifying Package Signatures

To ensure the authenticity and integrity of the Python package, follow the steps below to verify the package signature using cosign.

Steps to verify the signature:

  • Install cosign:
  • Download the file from the repository like pypi:
  • Download the signature files from the release tag, replace the version number with the version you are verifying:
  • Verify signature:
    # replace the version number with the version you are verifying
    ./cosign verify-blob snowflake_snowpark_python-1.22.1-py3-none-any.whl  \
    --certificate snowflake_snowpark_python-1.22.1-py3-none-any.whl.crt \
    --certificate-identity https://github.com/snowflakedb/snowpark-python/.github/workflows/python-publish.yml@refs/tags/v1.22.1 \
    --certificate-oidc-issuer https://token.actions.githubusercontent.com \
    --signature snowflake_snowpark_python-1.22.1-py3-none-any.whl.sig
    Verified OK
    

Contributing

Please refer to CONTRIBUTING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snowflake_snowpark_python-1.46.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

snowflake_snowpark_python-1.46.0-py3-none-any.whl (1.8 MB view details)

Uploaded Python 3

File details

Details for the file snowflake_snowpark_python-1.46.0.tar.gz.

File metadata

File hashes

Hashes for snowflake_snowpark_python-1.46.0.tar.gz
Algorithm Hash digest
SHA256 1020cb0860d6a850982c9e2fc8eb5dc2d2ca55873acce603e243c27889793ac7
MD5 cea133dc1a9457bc0b11b3b3630ccafc
BLAKE2b-256 66d3ed20282f9165ef368facff8300bc19d4dda793bffa14f4e6f08ed873e0c1

See more details on using hashes here.

File details

Details for the file snowflake_snowpark_python-1.46.0-py3-none-any.whl.

File metadata

File hashes

Hashes for snowflake_snowpark_python-1.46.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f92e01a0eede94d8496ac6cc5c762cbcd9cda2c4b52d97c7816b20ef04afbd74
MD5 86a7a5ce4fbc3908386c2f2a1266498e
BLAKE2b-256 76ff69e8bb87532da5b71c344daae8ebe34cd21a4ceee449dd67ad595414d4f2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page