Skip to main content

A simple library that runs queries on redshift!

Project description

redshift-query

image image image

Introduction

This is a very simple library that gets credentials of a cluster via redshift.GetClusterCredentials API call and then makes a connection to the cluster and runs the provided SQL statements, once done it will close the connection and return the results.

This is useful for when you want to run queries in CLIs or based on events for example on AWS Lambdas, or on a regular basis on AWS Glue Python Shell jobs.

Usage

While redshift_query could be used as a library it or it's own standalone script.

It requires the following parameters supplied either as environmental variables or keys in the parameters dict passed to the function:

  • DB_NAME: The database name. You can also provide this via db_name parameter to the function.
  • DB_USER: The user to request credentials from. You can optionally provide this via db_user parameter to the function.
  • CLUSTER_HOST: The host name of the redshift cluster. You can optionally provide this via cluster_host parameter to the function.
  • CLUSTER_ID: The id of the cluster. You can optionally provide this via cluster_id parameter to the function.
  • SQL_STATEMENTS: The SQL statements to run. You can optionally provide this via sql_statements parameter to the function. This parameter is going to be formatted(via string.format) with the event object that is passed to it. This way you can have the SQL statement to be based of the event that's passed to your function, which is useful for when you have a Lambda and it is called by an event source(S3 for example).
  • REDSHIFT_QUERY_LOG_LEVEL: By default set to ERROR, which logs nothing. Normally errors are not logged and bubbled up instead so they crash the script. If set to INFO, it will log the result of queries and if set to DEBUG it will log every thing that happens which is good for debugging why it is stuck.

Deploying it via a glue job

You can use the redshift-query-glue-job application from AWS Serverless Application Repository.

Deploying via AWS SAM & Lambda

Here's an example that copies to a table when new manifests in S3 are written:

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
  Function:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: ./redshift_query/redshift_query.py
      Handler: redshift_query.query
      Runtime: python3.8
      VpcConfig:
        SecurityGroupIds: 'sg-12312312'
        SubnetIds: 'sub-123123'
      Policies:
        - Version: '2012-10-17'
          Statement:
            - Effect: "Allow"
              Action: "redshift:GetClusterCredentials"
              Resource: "arn:aws:redshift:eu-west-1:123123123:dbuser:test/master" # https://docs.aws.amazon.com/redshift/latest/mgmt/generating-iam-credentials-role-permissions.html
      Events:
        S3:
          Type: Schedule
          Properties:
            Bucket: mybucket
            Events: s3:ObjectCreated:*
            Filter:
              S3Key:
                Rules:
                  - Name: suffix
                    Value: .manifest

      Environment:
        Variables:
          CLUSTER_ID: 'test'
          CLUSTER_HOST: 'test.j242tj1qkjuz.eu-west-1.redshift.amazonaws.com'
          DB_NAME: 'test'
          DB_USER: 'master'
          SQL_STATEMENTS: |-
            copy customer
              from {Records[0]['s3']['object']['key']}
            iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
            manifest
          LOG_LEVEL: DEBUG

Usage as a Library

To use the library directly you can simply provide the parameters in snake_case format:

import redshift_query

results = redshift_query.query({
    'db_name': 'test',
    'db_user': 'master',
    'cluster_host': 'test.j242tj1qkjuz.eu-west-1.redshift.amazonaws.com',
    'cluster_id': 'test',
    'sql_statements': '''
        copy customer
            from 's3://mybucket/cust.manifest'
            iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
            manifest;
        select * from customer;
     '''
})

[copy_results, select_results] = results

print('select results', select_results)

If you use redshift_query.query multiple times in your code you can use redshift_query.set_config to set the static configuration once:

import redshift_query

redshift_query.set_config({
    'db_name': 'test',
    'db_user': 'master',
    'cluster_host': 'test.j242tj1qkjuz.eu-west-1.redshift.amazonaws.com',
    'cluster_id': 'test'
})

redshift_query.query({
    'sql_statements': '''
        copy customer
            from 's3://mybucket/cust.manifest'
            iam_role 'arn:aws:iam::0123456789012:role/MyRedshiftRole'
            manifest;
     '''
})

Optionally you can also provide boto_session, if you want to assume role or provide your own credentials:

import redshift_query
import boto3

redshift_query.query({
    'boto_session': boto3.session.Session(profile_name='production'),
    'sql_statements': 'select 1;'
})

As a CLI

$ DB_USER=neshat DB_NAME=test CLUSTER_ID=test CLUSTER_HOST=test.3j232jjqji21.eu-west-1.redshift.amazonaws.com SQL_STATEMENTS='select 1;' REDSHIFT_QUERY_LOG_LEVEL=DEBUG redshift_query
DEBUG:redshift_query:Passed Event: {}
DEBUG:redshift_query:Passed this_config with added defaults: {'db_name': 'test', 'db_user': 'neshat', 'cluster_host': 'test.3j232jjqji21.eu-west-1.redshift.amazonaws.com', 'cluster_id': 'test', 'sql_statements': 'select 1;', 'boto_session': None}
DEBUG:redshift_query:Received Credentials
DEBUG:redshift_query:Connected
DEBUG:redshift_query:Running select 1
INFO:redshift_query:Statement "?column?
--------
       1
(1 row)" result:

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redshift-query-0.1.4.tar.gz (15.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

redshift_query-0.1.4-py2.py3-none-any.whl (7.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file redshift-query-0.1.4.tar.gz.

File metadata

  • Download URL: redshift-query-0.1.4.tar.gz
  • Upload date:
  • Size: 15.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0

File hashes

Hashes for redshift-query-0.1.4.tar.gz
Algorithm Hash digest
SHA256 876a603d175286b5f51c3d4fcc450b38882973d4c6432a3bb35369852928253e
MD5 d80331651c340ca1b812751590d01772
BLAKE2b-256 82a79dc13bf72ea6a6369eb55c1a4b44aa19980288e8cafb57b899e7cfd8c870

See more details on using hashes here.

File details

Details for the file redshift_query-0.1.4-py2.py3-none-any.whl.

File metadata

  • Download URL: redshift_query-0.1.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.8.0

File hashes

Hashes for redshift_query-0.1.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1c4235e7e7984c30017fbda29071a828dd82bd5e6c19b134a754edd6b9d7f71e
MD5 029bb4499131efb07cb73d2c4f11e3ef
BLAKE2b-256 457d3d7f15d2bfe93e766e0ba26de1fe2823874c1a0839787a290998583c41a9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page