Skip to main content

Helper class to connect to Redshift, Snowflake, DynamoDB and S3

Project description

db_utils library

Introduction

db-utils is a Python package that standardizes interactions with various types of databases. db-utils is a collection of modules that lowers the bar to viewing, extracting, and analyzing data from various sources including:

  • Redshift
  • Snowflake
  • Postgres
  • Mysql
  • Sqlserver
  • sqlite
  • s3
  • dynamoDB

Docker

One line command to start a Docker container with db-utils installed. This will run on any system that is running Docker. A jupyter notebook will open up on port 8888, just copy and paste the url from the terminal into your preferred browser.

docker run -p 8888:8888 hannaj06/db-utils:latest

docker-compose

Below is an example docker-compose.yaml file. With this configuration jupyter notebook changes persist in the location defined by <local_notebooks>. Below is a sample of the .databases.conf file which will be bindmounted to the docker image.

docker-compose.yaml

version: "2.1"
services:
  dbutils_juypter:
    image: hannaj06/db-utils
    ports:
      - 8888:8888
    volumes:
      - ${HOME}/.databases.conf:/root/.databases.conf
      - <local_notebooks>:/notebooks

.databases.conf

[redshift]
host=<redshift_host>
user=<user>
password=<redshift_password>
port=<port>
database=<db>

[s3]
aws_access_key_id=<access_id>
aws_secret_access_key=<secret_access>
region=<aws_region>
default_bucket=<default_bucket>

Installation on local envoirnment

  • sudo apt-get update

Required system packages:

  • sudo apt-get install python3-dev (Ubuntu)
  • sudo apt-get apt-get install g++ (Ubuntu)
  • sudo apt-get install libpq-dev (Ubuntu)
  • sudo apt-get install unixodbc-dev (Ubuntu)
  • brew install postgresql (MacOS)
pip install db_utils

pg_connect class (previously DBUtil)

A database connection class to interact with Postgres or Redshift

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
    [redshift_example]
    host=redshift.example.com
    user=test_user
    password=password
    port=5439
    database=test_db

    >>> from db_utils.pg_connect import pg_connect
    >>>
    >>> db = pg_connect('redshift_example', '.databases.conf')
    >>> db.get_arr_from_query('select * from test', pprint=True)

snowflake_connect class

A database connection class to interact with snowflake

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
    [snowflake]
    account=abc123.us-east-1
    host=abc123.us-east-1.snowflakecomputing.com
    user=test_user
    password=password
    port=443
    database=test_db
    aws_access_key_id=<key_id>
    aws_secret_access_key=<secret_key>

snowflake_s3 class

A child class of snowflake_connect class used to retrieve large datasets in small chunks

Basic Usage:

  • create database configuration file
  • example below is called .databases.conf
  • note the additional fields required
    [snowflake]
    account=abc123.us-east-1
    host=abc123.us-east-1.snowflakecomputing.com
    user=test_user
    password=password
    port=443
    database=test_db
    aws_access_key_id=<key_id>
    aws_secret_access_key=<secret_key>
    default_bucket=

example) Loading large data set into memory in chunks

    >>> from db_utils.snowflake_connect import snowflake_s3
    >>> import os
    >>>
    >>> file_format = '''
    TYPE = CSV
    COMPRESSION = NONE
    '''
    >>>
    >>>
    >>> with snowflake_s3('snowflake', '.databases.conf') as db:
    >>>     db.cursor('SELECT * FROM example_large_table', file_format=file_format, pprint=True)
    >>>
    >>>     while True:
    >>>         file = db.fetch(contents=True)
    >>>
    >>>         if file:
    >>>             for row in file:
    >>>                 print(row)
    >>>
    >>>         else:
    >>>             break


sqlite_connect class

A database connection class to interact with SQLite


    >>> from db_utils.sqlite_connect import sqlite_connect
    >>>
    >>> db = sqlite_connect('test.db')
    >>> db.get_df_from_query('select * from test_table', pprint=True)

s3_connect class

Connection library for interacting with S3

Basic Usage:

  • add s3 section to .databases.conf file (created in previous example)
    [s3]
        aws_access_key_id=<key_id>
        aws_secret_access_key=<secret_key>
        default_bucket=<bucket>


    >>> from db_utils.s3_connect import s3_connect
    >>>
    >>> s3 = s3_connect('.databases.conf', 's3')
    >>> s3.list_keys(prefix='examples')

example) grab file from s3 into memory as stringIO object

    >>> from db_utils.s3_connect import s3_connect
    >>>
    >>> s3 = s3_connect('.databases.conf', 's3')
    >>> s3.get_contents('example_file', stringIO=True)
    >>> s3.read()

sql_server connect class

Requirements:

Basic Usage:

  • add sql server section to .databases.conf file (created in previous example)
    [sql_server]
    driver=ODBC Driver 17 for SQL Server
    server=127.0.0.1
    user=bill
    password=gates
    database=master

    >>> from db_utils.sql_server_connect import sql_server_connect
    >>> db = sql_server_connect('sql_server', 'databases.conf')
    >>>
    >>> db.get_arr_from_query('''SELECT * FROM SYSOBJECTS''', pprint=True))

dynamodb_connect class

Connection library for interacting with Dynamodb

timer class

Helper class to time long running processes

Basic Usage:

>>> from db_utils.timer import timer
>>>
>>> t = timer()
>>> t.lap('s')
5.469961

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

db_utils-0.4.6.tar.gz (16.9 kB view details)

Uploaded Source

Built Distribution

db_utils-0.4.6-py3-none-any.whl (24.1 kB view details)

Uploaded Python 3

File details

Details for the file db_utils-0.4.6.tar.gz.

File metadata

  • Download URL: db_utils-0.4.6.tar.gz
  • Upload date:
  • Size: 16.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for db_utils-0.4.6.tar.gz
Algorithm Hash digest
SHA256 825531a57d5a1bd76d9c096f27a388b1ab03e1de870510fbf2eea2234223c8ee
MD5 9efcc717b9f71e943cce509b62fa330b
BLAKE2b-256 085459af67000f5e00c9e8511afac1ac309e5353b6d68d2413a52f6657ed9f0b

See more details on using hashes here.

File details

Details for the file db_utils-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: db_utils-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 24.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8

File hashes

Hashes for db_utils-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 efec1806a5406be6af804229dab406ae8acacce211c723006f147c2c3ca38151
MD5 f6a0906dfd6996423569baf2b16702de
BLAKE2b-256 2b76b3298e6092e536aeffec0dba91b4501643967dd4c469c9dd068b7d831878

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page