Skip to main content

A package for database session management

Project description

Data Warehouse Connector

A Python module to establish connections to various data warehouses like Snowflake, Redshift, and PostgreSQL.

Overview

This README.md file provides comprehensive instructions for installing, setting up, and using the Data Warehouse Connector module, ensuring users can easily establish connections to their data warehouses.


Data Warehouses

Snowflake

Requires DB_SOURCE, USERNAME, HOST, PASSWORD, ROLE, WAREHOUSE, and DATABASE.

from datawarehouse_connector.connector import get_session

snowflake_creds = {
   "db_source": "snowflake",
   "user": "user",
   "password": "password",
   "role": "role",
   "warehouse": "warehouse",
   "database": "database",
   "host": "host",
}

session = get_session(snowflake_creds)
session.close()

Whitelisting

If network policy is activated in the snowflake account and incoming ips are not allowed or restricted then need to whitelist our StepFunction IP :

Please follow the below steps for the same :

  1. Navigate to the Admin->Security section by clicking on "Admin" in the left navigation panel

  2. Switch to Network Rules. Create a new rule by clicking on + Network Rule button a. Name: SFN_RULE b. Choose Type: IPv4 and Mode: Ingress c. Under Identifiers -> Add IP 18.210.244.167

  3. Switch to Network Policy. Create a new policy by clicking on + Network Policy button a. Name: SFN_POLICY b. Under Allowed Section & Under Select Rule Dropdown select SFN_RULE then click on Create button to create the policy. c. Click on the dotted icon(...) at the end of the policy name and click Activate to start the policy.

  4. Navigate back to the worksheet and replace placeholder with the StepFunctions public IP address.

    ALTER NETWORK POLICY SFN_POLICY SET ALLOWED_IP_LIST=('18.210.244.167')

Redshift

Requires DB_SOURCE, USERNAME, HOST, PASSWORD, and DATABASE.

### Direct Connection (Redshift in Public Subnet)
from datawarehouse_connector.connector import get_session

**Database Connection Configuration**
   redshift_creds = {
   "db_source": "redshift",
   "user": "user",
   "password": "password",
   "database": "database",
   "host": "host"
   }

session = get_session(redshift_creds)
session.close()

### SSH Tunnel Connection (Redshift in Private Subnet)
#### Purpose
Connect to Amazon Redshift via an SSH tunnel when it's located in a private subnet, and the connecting instance (EC2) is in a public subnet.
you need to provide the SSH details and private key content to establish an SSH tunnel. 
#### Prerequisites
SSH Host: The hostname or IP address of the EC2 instance that has access to the Redshift cluster.
SSH Username: The username to log into the EC2 instance.
SSH PEM File Content: The content of the PEM file (private key) used to authenticate the SSH connection.

redshift_creds = {
   "db_source": "redshift",
   "user": "user",
   "password": "password",
   "database": "database",
   "host": "host",
   'ssh_host': 'ssh_host',
   'ssh_username': 'ssh_username',
   'ssh_pem_file_content': '''ssh_pem_file_content'''
}

session = get_session(redshift_creds)
session.close()

PostgreSQL

Requires DB_SOURCE, USERNAME, HOST, PASSWORD, and DATABASE.

from datawarehouse_connector.connector import get_session

postgresql_creds = {
   "db_source": "postgresql",
   "user": "user",
   "password": "password",
   "database": "database",
   "host": "host",
}

session = get_session(postgresql_creds)
session.close()

Handling Connection

Once the session is established, you can interact with your data warehouse using SQLAlchemy's ORM capabilities.


Troubleshooting

Common Issues

  • Invalid Credentials: Ensure that the USERNAME and PASSWORD are correct.
  • Host Unreachable: Verify the HOST address and network connectivity.
  • Unsupported Data Source: Check if the DB_SOURCE is among the supported ones (snowflake, redshift, postgresql).

Error Handling

The get_session method prints exceptions to help identify issues during the connection process. Ensure that the provided connection details are accurate and the data warehouse is accessible.


Conclusion

This module simplifies the process of connecting to various data warehouses. Follow the setup instructions carefully, and refer to the examples for guidance on using the get_session function. For further assistance, check the documentation or raise an issue on the project's GitHub repository.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datawarehouse_connector-0.1.3.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

datawarehouse_connector-0.1.3-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file datawarehouse_connector-0.1.3.tar.gz.

File metadata

File hashes

Hashes for datawarehouse_connector-0.1.3.tar.gz
Algorithm Hash digest
SHA256 33c94c1ce594a698a7d0b31d3828b29b3760439bb862f2f9977e59c648e650b6
MD5 9cb4d6160f36f13251c4ad6a2b040647
BLAKE2b-256 d39c1b902ac9b9293a838459efd7a352470761822d03603f317f9ba24fb0733b

See more details on using hashes here.

File details

Details for the file datawarehouse_connector-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for datawarehouse_connector-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 21f1c40e937edc0627ece1b63aa6f48066aba6d217b692239598f1e279145004
MD5 5941caea8c2f4749a50fe325cc727243
BLAKE2b-256 4d30e884a3571e6cca03c88631d00d937d19298a30fb5608b5be9d1668ef929a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page