Summit Public Schools Snowflake warehouse
- Anaconda & Python 3
- Jupyter Notebook
- To install, run:
pip install spswarehouse
- This can be done from
Anaconda Promptfrom the Start Menu.
- This can be done from
- Locate the install directory by running:
pip show pip | grep "Location:" | cut -d " " -f2
- If this doesn't work, run
pip show pip, then look at the line "Location:".
- If this doesn't work, run
The files referred to in this
README are in
Set up dependencies
- Change to the
- The default for Anaconda3 is
pip install -r requirements.txt
exit the Anaconda Prompt; the next step is more easily done in the File Explorer.
Set up credentials
The default directory where this module is installed is
Users\<your name>\Anaconda3\Lib\site-packages\spswarehouse. Your credentials are in the
- Copy the
This allows you to access the Snowflake data warehouse.
- Fill in your Snowflake
passwordcredentials between quotation marks.
This allows you to access your Google spreadsheets.
- Get the
private_keyfor the Google Service account from your team.
service-account, fill in the
private_keybetween quotation marks.
- The first time you
GoogleSheetsmodule, the service account's email address will be printed, you will share any spreadsheets you want to access with that email address.
Your Snowflake connection is configured in
credentials.py (see above).
Snowflake access is implemented in by
Warehouse. You can:
- Read data using
- Reflect a table using
- Run a SQL command using
Table & column name tab-completion
When you run
import spswarehouse, some tab-completion for table and column names is automatically set up.
The format is:
spswarehouse.<schema_name>.<table name>.c_<column name>
To reduce load time, tab-completion is automatically set up for only a few schemas when
If the schema you're using isn't tab-completing you can manually import it.
For example, to enable tab-competion for the schema
from spswarehouse.table_names import * initialize_schema_object(SchoolMint) schoolmint = SchoolMint()
table_utils module implements uploading data to the Snowflake warehouse.
The data sources you can upload from are:
- CSV file
- Google Sheet
The two major methods are
upload_to_warehouse. Both support the above data sources as optional arguments:
From Jupyter Notebook, open
snowflake-upload-example.ipynb for a basic example.
create_table_stmt() will try to guess column types when given a DataFrame, CSV file, or Google Sheet.
If you want to explicitly name and type your columns, you can pass in the
columns argument instead.
See the documentation for
guess_col_types() for best practices for types.
Make sure you've set up
credentials.py first and shared your spreadsheet with the Google service account email. You can also get the email by running:
The Info Team service account e-mail is
GoogleSheets is really an instance of
gspread.Client, so you use the entire
gspread Python API.
From Jupyter Notebook, open and run
googlesheets-example.ipynb for a basic example on loading a spreadsheet and reading sheet data into
Uploading to warehouse
From Jupyter Notebook open and run
snowflake-upload-example.ipynb for a basic example on uploading Google Sheet data to the Snowflake warehouse.
Google service account key
This lets us use the Google Sheets API to access sheet data. It only has to be done once and added to
- Use an existing Google Developer project, or create a new one: https://console.cloud.google.com
- Enable the Google Sheets API
- Go to API & Services for the project, then Libraries.
- Search for "Google Sheets" and select the result.
- Click Enable.
- Create the OAuth client credentials
- Go to API & Services for the project, then Credentials.
- Under Create credentials, select Service account key
- Choose an existing service account or create a new one to associate this key with.
- Create the key and download the key as a JSON file.
- Copy OAuth client credentials to
- Delete the private_key and leave just the quotation marks when you check in
- You will need to distribute the private key securely so it can be added to
spswarehouse project is here.
Create PyPI and Test PyPI accounts to test and upload packages.
See https://packaging.python.org/tutorials/packaging-projects/ for an overview and walk-through of PyPI packaging.
- Only build the
credentials.pyand potentially passwords will get distributed in the binary distribution.
- If you need to include non-Python files, add them to
- Update version number in
- Create the package:
python setup.py sdist
- Upload to Test PyPI:
python -m twine upload --repository-url https://test.pypi.org/legacy/ dist/*
- Install on local machine to test:
pip install -i https://test.pypi.org/simple/
Pushing a new package
Make sure all of your changes are checked into the GitHub repository and your local repository is up-to-date before you do this.
The steps are the same as in the above section, omitting the
|Filename, size||File type||Python version||Upload date||Hashes|
|Filename, size spswarehouse-hs-test-0.0.3.tar.gz (9.7 kB)||File type Source||Python version None||Upload date||Hashes View hashes|
Hashes for spswarehouse-hs-test-0.0.3.tar.gz