Skip to main content

Convenience wrappers for connecting to AWS S3 and Redshift

Project description

Whooper

Author:

Nick Buker

Introduction:

Whooper is a small collection of utility functions for accessing AWS S3 and AWS Redshift. The goal of Whooper is to be a simple, robust package to ease data work-flow. It is not intended to handle every possible need (for example credential management is largely left to the user) but it is designed to streamline common tasks.

Table of contents:

Installing Whooper:

Setting up credentials for Whooper:

How to use Whooper:

Testing:

Installing Whooper:

Whooper can be installed via pip. As always, use of a project-level virtual environment is recommended.

Whooper requires Python >= 3.6.

$ pip install whooper

Setting up credentials for Whooper:

Redshift:

Whooper is designed to ingest your Redshift credentials as an environment variable in the below format. This method allows the user freedom to handle credentials a number of ways. As always, best practices are advised. Your credentials should never be placed in the code of your project such as in a Dockerfile or .env file. Instead, you may wish to place them in your .bash_profile locally or take advantage of a key management service such as the one offered by AWS.

'host=my_hostname database=my_database user=my_user password=my_password port=1234'

S3:

If the user is running locally, their Home directory should contain a .aws/ directory with a credentials file. The credentials file should look similar to the example below where the profile name is in brackets. Note that the specific values and region may vary. If the user is running on an EC2, instance permission to access S3 is handled by the IAM role for the instance.

[default]
aws_access_key_id=MYAWSACCESSKEY
aws_secret_access_key=MYAWSSECRETACCESS
aws_session_token="long_string_of_random_characters=="
aws_security_token="another_string_of_random_characters=="
region=us-west-2

Note the the profile name in brackets. If the profile name differs in your credentials file, you will likely need to pass this profile name to the S3 functions as an argument.

How to use Whooper:

Redshift:

Importing whooper Redshift functions:

from whooper import read_sql, redshift_execute_sql, redshift_get_conn

Reading a SQL script into Python as a string:

sql = read_sql(sql_filename='../sql/my_script.sql')

Executing a SQL query that does not return data:

redshift_execute_sql(
    sql=sql,
    env_var='REDSHIFT_CREDS',
    return_data=False,
    return_dict=False)

Executing a SQL query that returns data as a list of tuples and column names as a list of strings:

data, columns = redshift_execute_sql(
    sql=sql,
    env_var='REDSHIFT_CREDS',
    return_data=True,
    return_dict=False)

Executing a SQL query that returns data as a dict for easy ingestion into a pandas DataFrame:

import pandas as pd

df = pd.DataFrame(**redshift_execute_sql(
    sql=sql,
    env_var='REDSHIFT_CREDS',
    return_data=True,
    return_dict=True))

Creating a connection object that can be manipulated directly by experienced users:

conn = redshift_get_conn(env_var='REDSHIFT_CREDS')

S3:

Importing S3 functions:

from whooper import s3_download, s3_upload, s3_delete, create_session, s3_get_bucket

Downloading a single file from S3:

s3_download(
    bucket='my_bucket',
    s3_filepath='tmp/my_file.csv',
    local_filepath='../data/my_file.csv')

Downloading with a profile name:

s3_download(
    bucket='my_bucket',
    profile_name='my-profile-name',
    s3_filepath='tmp/my_file.csv',
    local_filepath='../data/my_file.csv')

Downloading a list of files from S3 (will not upload contents of subdirectories):

s3_download(
    bucket='my_bucket',
    s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'],
    local_filepath=['../data/my_file1.csv', '../data/my_file2.csv', '../img.png'])

Downloading files matching a pattern from S3 (will not upload contents of subdirectories):

s3_download(
    bucket='my_bucket',
    s3_filepath='tmp/*.csv',
    local_filepath='../data/')

Downloading all files in a directory from S3 (will not upload contents of subdirectories):

s3_download(
    bucket='my_bucket',
    s3_filepath='tmp/*',
    local_filepath='../data/')

Uploading a single file to S3:

s3_upload(
    bucket='my_bucket',
    local_filepath='../data/my_file.csv',
    s3_filepath='tmp/my_file.csv')

Uploading with a profile name:

s3_upload(
    bucket='my_bucket',
    profile_name='my-profile-name',
    local_filepath='../data/my_file.csv',
    s3_filepath='tmp/my_file.csv')

Uploading a list of files to S3 (will not upload contents of subdirectories):

s3_upload(
    bucket='my_bucket',
    local_filepath=['../data/my_file1.csv', '../data/my_file2.csv', '../img.png'],
    s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'])

Uploading files matching a pattern to S3 (will not upload contents of subdirectories):

s3_upload(
    bucket='my_bucket',
    local_filepath='../data/*.csv',
    s3_filepath='tmp/')

Uploading all files in a directory to S3 (will not upload contents of subdirectories):

s3_upload(
    bucket='my_bucket',
    local_filepath='../data/*'
    s3_filepath='tmp/')

Deleting a single file in S3:

resp = s3_delete(bucket='my_bucket', s3_filepath='tmp/my_file.csv')

Deleting with a profile name:

s3_upload(
    bucket='my_bucket',
    profile_name='my-profile-name',
    s3_filepath='tmp/my_file.csv')

Deleting a list of files in S3:

resp = s3_delete(
    bucket='my_bucket',
    s3_filepath=['tmp/my_file1.csv', 'tmp/my_file2.csv', 'img.png'])

Deleting files matching a pattern in S3:

resp = s3_delete(bucket='my_bucket', s3_filepath='tmp/*.csv')

Deleting all files in a directory in S3:

resp = s3_delete(bucket='my_bucket', s3_filepath='tmp/*')

Creating a bucket object that can be manipulated directly by experienced users:

bucket = s3_get_bucket(
    bucket='my_bucket',
    profile_name='default',
    region_name='us-west-2')

Boto3:

Importing boto3 functions:

from whooper import boto_get_creds, boto_create_session

Retrieves Boto3 credentials as a string for use in COPY and UNLOAD SQL statetments:

creds = boto_get_creds(
    profile_name='default',
    region_name='us-west-2',
    session=None)

Creating a boto3 session object that can be manipulated directly by experienced users:

session = boto_create_session(profile_name='default', region_name='us-west-2')

Transferring data between Redshift and S3:

Transferring data from Redshift to S3 using an UNLOAD statement (see Redshift UNLOAD documentation for more information):

from whooper import boto_get_creds, redshift_execute_sql


creds = boto_get_creds(
    profile_name='default',
    region_name='us-west-2',
    session=None)

sql = f'''

    unload (
        'select
            col1
            ,col2
        from
            my_schema.my_table'
    )
    to
        's3://mybucket/unload/my_table/'
    credentials
        '{creds}'
    parallel off header gzip allowoverwrite;
'''

redshift_execute_sql(
    sql=sql,
    env_var='REDSHIFT_CREDS',
    return_data=False,
    return_dict=False)

Transferring data from S3 to Redshift using a COPY statement (see Redshift COPY documentation for more information):

from whooper import boto_get_creds, redshift_execute_sql


creds = boto_get_creds(
    profile_name='default',
    region_name='us-west-2',
    session=None)

sql = f'''

    copy
        my_schema.my_table
    from
        's3://mybucket/unload/my_table/'
    credentials
        '{creds}'
    ignoreheader 1 gzip;
'''

redshift_execute_sql(
    sql=sql,
    env_var='REDSHIFT_CREDS',
    return_data=False,
    return_dict=False)

Testing:

For those interested in contributing to Whooper or forking and editing the project, pytest is the testing framework used. To run the tests, create a virtual environment, install the contents of dev-requirements.txt, and run the following command from the root directory of the project. The testing scripts can be found in the test/ directory.

$ pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whooper-0.0.1.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whooper-0.0.1-py3-none-any.whl (13.1 kB view details)

Uploaded Python 3

File details

Details for the file whooper-0.0.1.tar.gz.

File metadata

  • Download URL: whooper-0.0.1.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for whooper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 2fbb59cb9ac9e8adb67d792e3e9057f1e4e1b209435997dead9881970859f00b
MD5 f290bda8794d7340c5cb51992e7a1dbf
BLAKE2b-256 e3714dfd8b9812fb188c489fc18a45e778f36b894814734b4990ddd771d2ef97

See more details on using hashes here.

File details

Details for the file whooper-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: whooper-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.25.1

File hashes

Hashes for whooper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e7b1af317dc6022659ba34e2817aa7cb00dd86c6cbb294924806fda9148952b8
MD5 77ffd0322ea0dbc6abf6c11e8689307e
BLAKE2b-256 a6ddb8b869e34bed986824d48497e8b0a9a69472b99934e2a5a88500db82f600

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page