Skip to main content

Redshift interface library

Project description

redshift_connector is the Amazon Redshift connector for Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

  • IAM authentication

  • Identity provider (IdP) authentication

  • Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

The easiest way to get started with redshift_connector is via pip

pip install redshift_connector

For pandas and numpy support, install as shown below:

pip install 'redshift_connector[full]'

Note: redshift_connector requires Python >= 3.5

You can install from source by cloning this repository.

$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ pip install .

Basic Example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Integration with pandas

import pandas
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                   [
                       ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                       ('A Brief History of Time', 'Stephen Hawking')

                   ])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>>                         bookname                 author
>> 0  One Hundred Years of Solitude  Gabriel García Márquez
>> 1        A Brief History of Time         Stephen Hawking

Integration with numpy

import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>>  ['A Brief History of Time' 'Stephen Hawking']]

Query using functions

cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]

Connection Parameters

Name

Description

Default Value

Required

database

String. The name of the database to connect to

Yes

user

String. The username to use for authentication

Yes

password

String. The password to use for authentication

Yes

host

String. The hostname of Amazon Redshift cluster

Yes

port

Int. The port number of the Amazon Redshift cluster

5439

No

ssl

Bool. If SSL is enabled

True

No

iam

Bool. If IAM Authentication is enabled

False

No

sslmode

String. The security of the connection to Amazon Redshift. ‘verify-ca’ and ‘verify-full’ are supported.

‘verify-ca’

No

idp_response_timeout

Int. The timeout for retrieving SAML assertion from IdP

120

No

listen_port | Int. The listen port IdP will send the SAML assertion to

7890

No

log_level

Int. The level of logging enabled, increasing in granularity (values [0,4] are valid)

0

No

log_path

String. The file path to the log file

‘driver.log’

No

max_prepared_statements

Int. The maximum number of prepared statements that can be open at once

1000

No

idp_tenant

String. The IdP tenant

None

No

credentials_provider

String. The IdP that will be used for authenticating with Amazon Redshift. ‘OktaCredentialsProvider’, ‘AzureCredentialsProvider’, ‘BrowserAzureCredentialsProvider’, ‘PingCredentialsProvider’, ‘BrowserSamlCredentialsProvider’, and ‘AdfsCredentialsProvider’ are supported

None

No

cluster_identifier

String. The cluster identifier of the Amazon Redshift Cluster

None

No

db_user

String. The user ID to use with Amazon Redshift

None

No

db_groups

String. A comma-separated list of existing database group names that the DbUser joins for the current session

None

No

auto_create

Bool. Indicates whether the user should be created if they do not exist

False

No

allow_db_user_override

Bool. True specifies the driver uses the DbUser value from the SAML assertion while False indicates the value in the DbUser connection parameter is used

False

No

login_url

String. The SSO Url for the IdP

None

No

preferred_role

String. The IAM role preferred for the current connection

None

No

client_secret

String. The client secret from Azure IdP

None

No

client_id

String. The client id from Azure IdP

None

No

region

String. The AWS region where the cluster is located

None

No

app_name

String. The name of the IdP application used for authentication

None

No

Getting Help

Contributing

We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.

Running Tests

You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

redshift_connector-2.0.711-py3-none-any.whl (72.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page