Skip to main content

Summit Public Schools Snowflake warehouse

Project description

spswarehouse

Prerequisites

  • Anaconda & Python 3
  • Jupyter Notebook

Installation

  • To install, run: pip install spswarehouse
    • This can be done from Anaconda Prompt from the Start Menu.
  • Locate the install directory by running: pip show pip | grep "Location:" | cut -d " " -f2
    • If this doesn't work, run pip show pip, then look at the line "Location:".

The files referred to in this README are in <install-directory>/spswarehouse/.

Set up dependencies

  • Change to the spswarehouse directory
    • cd <install-directory>\spswarehouse
    • The default for Anaconda3 is cd Anaconda3\Lib\site-packages\spswarehouse
  • Run: pip install -r requirements.txt

You can exit the Anaconda Prompt; the next step is more easily done in the File Explorer.

Set up credentials

The default directory where this module is installed is Users\<your name>\Anaconda3\Lib\site-packages\spswarehouse. Your credentials are in the spswarehouse subdirectory.

  • Copy the credentials.py.template file to credentials.py.

Snowflake

This allows you to access the Snowflake data warehouse.

  • Fill in your Snowflake user and password credentials between quotation marks.

Google Sheets

This allows you to access your Google spreadsheets.

  • Get the private_key for the Google Service account from your team.
  • In credentials.py, under google_config and service-account, fill in the private_key between quotation marks.
  • The first time you import the GoogleSheets module, the service account's email address will be printed, you will share any spreadsheets you want to access with that email address.

Usage

Snowflake

Your Snowflake connection is configured in credentials.py (see above).

Snowflake access is implemented in by Warehouse. You can:

  • Read data using read_sql()
  • Reflect a table using reflect_table()
  • Run a SQL command using execute()

Table & column name tab-completion

When you run import spswarehouse, some tab-completion for table and column names is automatically set up.

The format is:

spswarehouse.<schema_name>.<table name>.c_<column name>

To reduce load time, tab-completion is automatically set up for only a few schemas when spswarehouseis imported.

If the schema you're using isn't tab-completing you can manually import it.

For example, to enable tab-competion for the schema schoolmint, run:

from spswarehouse.table_names import *

initialize_schema_object(SchoolMint)
schoolmint = SchoolMint()

Uploading data

The table_utils module implements uploading data to the Snowflake warehouse.

The data sources you can upload from are:

  • pandas.DataFrame dataframe
  • CSV file csv_filename
  • Google Sheet google_sheet

The two major methods are create_table_stmt and upload_to_warehouse. Both support the above data sources as optional arguments:

  • dataframe
  • csv_filename
  • google_sheet

From Jupyter Notebook, open snowflake-upload-example.ipynb for a basic example.

Column types

create_table_stmt() will try to guess column types when given a DataFrame, CSV file, or Google Sheet.

If you want to explicitly name and type your columns, you can pass in the columns argument instead.

See the documentation for guess_col_types() for best practices for types.

Google Sheets

Make sure you've set up credentials.py first and shared your spreadsheet with the Google service account email. You can also get the email by running:

GoogleSheets.get_google_service_account_email()

The Info Team service account e-mail is jupyter-sheets@sps-warehouse.iam.gserviceaccount.com

GoogleSheets is really an instance of gspread.Client, so you use the entire gspread Python API.

Accessing data

From Jupyter Notebook, open and run googlesheets-example.ipynb for a basic example on loading a spreadsheet and reading sheet data into pandas.DataFrame.

Uploading to warehouse

From Jupyter Notebook open and run snowflake-upload-example.ipynb for a basic example on uploading Google Sheet data to the Snowflake warehouse.

Column types

Developer notes

Google service account key

This lets us use the Google Sheets API to access sheet data. It only has to be done once and added to credentials.py.template.

  • Use an existing Google Developer project, or create a new one: https://console.cloud.google.com
  • Enable the Google Sheets API
    • Go to API & Services for the project, then Libraries.
    • Search for "Google Sheets" and select the result.
    • Click Enable.
  • Create the OAuth client credentials
    • Go to API & Services for the project, then Credentials.
    • Under Create credentials, select Service account key
    • Choose an existing service account or create a new one to associate this key with.
    • Create the key and download the key as a JSON file.
  • Copy OAuth client credentials to credentials.py.template in google_client under service-account.
  • Delete the private_key and leave just the quotation marks when you check in credentials.template.py.
  • You will need to distribute the private key securely so it can be added to credentials.py.

PyPI

We use PyPI to distribute the spswarehouse module and Test PyPI for testing.

The spswarehouse project is here.

Set up

Create PyPI and Test PyPI accounts to test and upload packages.

Packaging

See https://packaging.python.org/tutorials/packaging-projects/ for an overview and walk-through of PyPI packaging.

Specifics for spswarehouse:

  • Only build the sdist package. Otherwise, credentials.py and potentially passwords will get distributed in the binary distribution.
  • If you need to include non-Python files, add them to MANIFEST.in.

Testing

  • Update version number in setup.py.
  • Create the package: python setup.py sdist
  • Upload to Test PyPI: python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
  • Install on local machine to test: pip install -i https://test.pypi.org/simple/

Pushing a new package

Make sure all of your changes are checked into the GitHub repository and your local repository is up-to-date before you do this.

The steps are the same as in the above section, omitting the test.pypi URLs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spswarehouse-hs-test-0.0.3.tar.gz (9.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page