Skip to main content

Redshift interface library

Project description

Python Version PyPi

redshift_connector is the Amazon Redshift connector for Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

  • IAM authentication

  • Identity provider (IdP) authentication

  • Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

Source

Downloads

Installation Command

PyPi

PyPi Downloads

pip install redshift_connector

Conda

Conda Downloads

conda install -c conda-forge redshift_connector

Additionally, you may install from source by cloning this repository.

$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .

Basic Example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Example using IAM Credentials

IAM Credentials can be supplied directly to connect(...) using an AWS profile as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    profile='default'
 )
# ~/.aws/credentials
[default]
aws_access_key_id="my_aws_access_key_id"
aws_secret_access_key="my_aws_secret_access_key"
aws_session_token="my_aws_session_token"

# ~/.aws/config
[default]
region=us-west-2

If a region is not provided in ~/.aws/config or you would like to override its value, region may be passed to connect(...).

Alternatively, IAM credentials can be supplied directly to connect(...) using AWS credentials as shown below:

import redshift_connector

# Connects to Redshift cluster using IAM credentials from default profile defined in ~/.aws/credentials
conn = redshift_connector.connect(
    iam=True,
    database='dev',
    db_user='awsuser',
    password='',
    user='',
    cluster_identifier='examplecluster',
    access_key_id="my_aws_access_key_id",
    secret_access_key="my_aws_secret_access_key",
    session_token="my_aws_session_token",
    region="us-east-2"
 )

Integration with pandas

import pandas
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                   [
                       ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                       ('A Brief History of Time', 'Stephen Hawking')

                   ])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>>                         bookname                 author
>> 0  One Hundred Years of Solitude  Gabriel García Márquez
>> 1        A Brief History of Time         Stephen Hawking

Integration with numpy

import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>>  ['A Brief History of Time' 'Stephen Hawking']]

Query using functions

cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]

Connection Parameters

Name

Description

Default Value

Required

database

String. The name of the database to connect to

Yes

user

String. The username to use for authentication

Yes

password

String. The password to use for authentication

Yes

host

String. The hostname of Amazon Redshift cluster

Yes

port

Int. The port number of the Amazon Redshift cluster

5439

No

ssl

Bool. If SSL is enabled

True

No

iam

Bool. If IAM Authentication is enabled

False

No

sslmode

String. The security of the connection to Amazon Redshift. ‘verify-ca’ and ‘verify-full’ are supported.

‘verify-ca’

No

idp_response_timeout

Int. The timeout for retrieving SAML assertion from IdP

120

No

listen_port

Int. The listen port IdP will send the SAML assertion to

7890

No

max_prepared_statements

Int. The maximum number of prepared statements that can be open at once

1000

No

idp_tenant

String. The IdP tenant

None

No

credentials_provider

String. The IdP that will be used for authenticating with Amazon Redshift. ‘OktaCredentialsProvider’, ‘AzureCredentialsProvider’, ‘BrowserAzureCredentialsProvider’, ‘PingCredentialsProvider’, ‘BrowserSamlCredentialsProvider’, and ‘AdfsCredentialsProvider’ are supported

None

No

cluster_identifier

String. The cluster identifier of the Amazon Redshift Cluster

None

No

ssl_insecure

Bool. Specifies if IDP hosts server certificate will be verified

True

No

db_user

String. The user ID to use with Amazon Redshift

None

No

db_groups

String. A comma-separated list of existing database group names that the DbUser joins for the current session

None

No

auto_create

Bool. Indicates whether the user should be created if they do not exist

False

No

allow_db_user_override

Bool. True specifies the driver uses the DbUser value from the SAML assertion while False indicates the value in the DbUser connection parameter is used

False

No

login_url

String. The SSO Url for the IdP

None

No

preferred_role

String. The IAM role preferred for the current connection

None

No

client_secret

String. The client secret from Azure IdP

None

No

client_id

String. The client id from Azure IdP

None

No

region

String. The AWS region where the cluster is located

None

No

app_name

String. The name of the IdP application used for authentication

None

No

access_key_id

String. The The access key for the IAM role or IAM user configured for IAM database authentication

None

No

secret_access_key_id

String. The The secret access key for the IAM role or IAM user configured for IAM database authentication

None

No

session_token

String. The The access key for the IAM role or IAM user configured for IAM database. authentication. Not required unless temporary AWS credentials are being used.

None

No

profile

String. The name of a profile in a AWS credentials file that contains AWS credentials.

None

No

Logging

redshift_connector uses logging for providing detailed error messages regarding IdP authentication. A do-nothing handler is enabled by default as to prevent logs from being output to sys.stderr.

Enable logging in your application to view logs output by redshift_connector as described in the documentation for Python logging module.

Getting Help

Contributing

We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.

Running Tests

You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

redshift_connector-2.0.873-py3-none-any.whl (80.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page