Skip to main content

Redshift interface library

Project description

redshift_connector is the Amazon Redshift connector for Python. Easy integration with pandas and numpy, as well as support for numerous Amazon Redshift specific features help you get the most out of your data

Supported Amazon Redshift features include:

  • IAM authentication

  • Identity provider (IdP) authentication

  • Redshift specific data types

This pure Python connector implements Python Database API Specification 2.0.

Getting Started

The easiest way to get started with redshift_connector is via pip

pip install redshift_connector

Note: redshift_connector requires Python >= 3.5

You can install from source by cloning this repository. Assuming that you have Python and virtualenv installed, set up your environment and install the required dependencies like this:

$ git clone https://github.com/aws/amazon-redshift-python-driver.git
$ cd redshift_connector
$ virtualenv venv
$ . venv/bin/activate
$ python -m pip install -r requirements.txt
$ python -m pip install -e .
$ python -m pip install redshift_connector

Basic Example

import redshift_connector

# Connects to Redshift cluster using AWS credentials
conn = redshift_connector.connect(
    host='examplecluster.abc123xyz789.us-west-1.redshift.amazonaws.com',
    port=5439,
    database='dev',
    user='awsuser',
    password='my_password'
 )

cursor: redshift_connector.Cursor = conn.cursor()
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                    [
                        ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                        ('A Brief History of Time', 'Stephen Hawking')
                    ]
                  )
cursor.execute("select * from book")

result: tuple = cursor.fetchall()
print(result)
>> (['One Hundred Years of Solitude', 'Gabriel García Márquez'], ['A Brief History of Time', 'Stephen Hawking'])

Integration with pandas

import pandas
cursor.execute("create Temp table book(bookname varchar,author‎ varchar)")
cursor.executemany("insert into book (bookname, author‎) values (%s, %s)",
                   [
                       ('One Hundred Years of Solitude', 'Gabriel García Márquez'),
                       ('A Brief History of Time', 'Stephen Hawking')

                   ])
cursor.execute("select * from book")
result: pandas.DataFrame = cursor.fetch_dataframe()
print(result)
>>                         bookname                 author
>> 0  One Hundred Years of Solitude  Gabriel García Márquez
>> 1        A Brief History of Time         Stephen Hawking

Integration with numpy

import numpy
cursor.execute("select * from book")

result: numpy.ndarray = cursor.fetch_numpy_array()
print(result)
>> [['One Hundred Years of Solitude' 'Gabriel García Márquez']
>>  ['A Brief History of Time' 'Stephen Hawking']]

Query using functions

cursor.execute("SELECT CURRENT_TIMESTAMP")
print(cursor.fetchone())
>> [datetime.datetime(2020, 10, 26, 23, 3, 54, 756497, tzinfo=datetime.timezone.utc)]

Connection Parameters

Name

Description

Default Value

Required

database

String. The name of the database to connect to

Yes

user

String. The username to use for authentication

Yes

password

String. The password to use for authentication

Yes

host

String. The hostname of Amazon Redshift cluster

Yes

port

Int. The port number of the Amazon Redshift cluster

5439

No

ssl

Bool. If SSL is enabled

True

No

iam

Bool. If IAM Authentication is enabled

False

No

sslmode

String. The security of the connection to Amazon Redshift. ‘verify-ca’ and ‘verify-full’ are supported.

‘verify-ca’

No

idp_response_timeout

Int. The timeout for retrieving SAML assertion from IdP

120

No

idp_port

Int. The listen port IdP will send the SAML assertion to

7890

No

log_level

Int. The level of logging enabled, increasing in granularity (values [0,4] are valid)

0

No

log_path

String. The file path to the log file

‘driver.log’

No

max_prepared_statements

Int. The maximum number of prepared statements that can be open at once

1000

No

idp_tenant

String. The IdP tenant

None

No

credential_provider

String. The IdP that will be used for authenticating with Amazon Redshift. ‘OktaCredentialsProvider’, ‘AzureCredentialsProvider’, ‘BrowserAzureCredentialsProvider’, ‘PingCredentialsProvider’, ‘BrowserSamlCredentialsProvider’, and ‘AdfsCredentialsProvider’ are supported.

None

No

cluster_identifier

String. The cluster identifier of the Amazon Redshift Cluster

None

No

db_user

String. The user ID to use with Amazon Redshift

None

No

login_url

String. The SSO Url for the IdP

None

No

preferred_role

String. The IAM role preferred for the current connection

None

No

client_secret

String. The client secret from Azure IdP

None

No

client_id

String. The client id from Azure IdP

None

No

region

String. The AWS region where the cluster is located

None

No

app_name

String. The name of the IdP application used for authentication.

None

No

Getting Help

Contributing

We look forward to collaborating with you! Please read through CONTRIBUTING before submitting any issues or pull requests.

Running Tests

You can run tests by using pytest test/unit. This will run all unit tests. Integration tests require providing credentials for an Amazon Redshift cluster as well as IdP attributes in test/config.ini.

Additional Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

redshift_connector-2.0.405-py3-none-any.whl (70.1 kB view details)

Uploaded Python 3

File details

Details for the file redshift_connector-2.0.405-py3-none-any.whl.

File metadata

  • Download URL: redshift_connector-2.0.405-py3-none-any.whl
  • Upload date:
  • Size: 70.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.2.0 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.7.7

File hashes

Hashes for redshift_connector-2.0.405-py3-none-any.whl
Algorithm Hash digest
SHA256 77b26d82ddb7fb4a324814638cd3b1cee4efbe1443868f773ed90a6f6176c48c
MD5 443609c578ce2461e69a13c8e9a1f653
BLAKE2b-256 3a3c4fcf81a3ab8e75431b4938e13f83f69f774b567e41263e41e4abf71f9e4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page