Helper class to connect to Redshift, Snowflake, DynamoDB and S3
Project description
db_utils library
Introduction
db-utils is a Python package that standardizes interactions with various types of databases. db-utils is a collection of modules that lowers the bar to viewing, extracting, and analyzing data from various sources including:
- Redshift
- Snowflake
- Postgres
- Mysql
- Sqlserver
- sqlite
- s3
- dynamoDB
Docker
One line command to start a Docker container with db-utils installed. This will run on any system that is running Docker. A jupyter notebook will open up on port 8888, just copy and paste the url from the terminal into your preferred browser.
docker run -p 8888:8888 hannaj06/db-utils:latest
docker-compose
Below is an example docker-compose.yaml
file. With this configuration jupyter notebook changes persist in the location defined by <local_notebooks>
. Below is a sample of the .databases.conf
file which will be bindmounted to the docker image.
docker-compose.yaml
version: "2.1"
services:
dbutils_juypter:
image: hannaj06/db-utils
ports:
- 8888:8888
volumes:
- ${HOME}/.databases.conf:/root/.databases.conf
- <local_notebooks>:/notebooks
.databases.conf
[redshift]
host=<redshift_host>
user=<user>
password=<redshift_password>
port=<port>
database=<db>
[s3]
aws_access_key_id=<access_id>
aws_secret_access_key=<secret_access>
region=<aws_region>
default_bucket=<default_bucket>
Installation on local envoirnment
- sudo apt-get update
Required system packages:
- sudo apt-get install python3-dev (Ubuntu)
- sudo apt-get apt-get install g++ (Ubuntu)
- sudo apt-get install libpq-dev (Ubuntu)
- sudo apt-get install unixodbc-dev (Ubuntu)
- brew install postgresql (MacOS)
pip install db_utils
pg_connect class (previously DBUtil)
A database connection class to interact with Postgres or Redshift
Basic Usage:
- create database configuration file
- example below is called .databases.conf
[redshift_example]
host=redshift.example.com
user=test_user
password=password
port=5439
database=test_db
>>> from db_utils.pg_connect import pg_connect
>>>
>>> db = pg_connect('redshift_example', '.databases.conf')
>>> db.get_arr_from_query('select * from test', pprint=True)
snowflake_connect class
A database connection class to interact with snowflake
Basic Usage:
- create database configuration file
- example below is called .databases.conf
[snowflake]
account=abc123.us-east-1
host=abc123.us-east-1.snowflakecomputing.com
user=test_user
password=password
port=443
database=test_db
aws_access_key_id=<key_id>
aws_secret_access_key=<secret_key>
snowflake_s3 class
A child class of snowflake_connect class used to retrieve large datasets in small chunks
Basic Usage:
- create database configuration file
- example below is called .databases.conf
- note the additional fields required
[snowflake]
account=abc123.us-east-1
host=abc123.us-east-1.snowflakecomputing.com
user=test_user
password=password
port=443
database=test_db
aws_access_key_id=<key_id>
aws_secret_access_key=<secret_key>
default_bucket=
example) Loading large data set into memory in chunks
>>> from db_utils.snowflake_connect import snowflake_s3
>>> import os
>>>
>>> file_format = '''
TYPE = CSV
COMPRESSION = NONE
'''
>>>
>>>
>>> with snowflake_s3('snowflake', '.databases.conf') as db:
>>> db.cursor('SELECT * FROM example_large_table', file_format=file_format, pprint=True)
>>>
>>> while True:
>>> file = db.fetch(contents=True)
>>>
>>> if file:
>>> for row in file:
>>> print(row)
>>>
>>> else:
>>> break
sqlite_connect class
A database connection class to interact with SQLite
>>> from db_utils.sqlite_connect import sqlite_connect
>>>
>>> db = sqlite_connect('test.db')
>>> db.get_df_from_query('select * from test_table', pprint=True)
s3_connect class
Connection library for interacting with S3
Basic Usage:
- add s3 section to .databases.conf file (created in previous example)
[s3]
aws_access_key_id=<key_id>
aws_secret_access_key=<secret_key>
default_bucket=<bucket>
>>> from db_utils.s3_connect import s3_connect
>>>
>>> s3 = s3_connect('.databases.conf', 's3')
>>> s3.list_keys(prefix='examples')
example) grab file from s3 into memory as stringIO object
>>> from db_utils.s3_connect import s3_connect
>>>
>>> s3 = s3_connect('.databases.conf', 's3')
>>> s3.get_contents('example_file', stringIO=True)
>>> s3.read()
sql_server connect class
Requirements:
- sql server drivers - https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-2017
Basic Usage:
- add sql server section to .databases.conf file (created in previous example)
[sql_server]
driver=ODBC Driver 17 for SQL Server
server=127.0.0.1
user=bill
password=gates
database=master
>>> from db_utils.sql_server_connect import sql_server_connect
>>> db = sql_server_connect('sql_server', 'databases.conf')
>>>
>>> db.get_arr_from_query('''SELECT * FROM SYSOBJECTS''', pprint=True))
dynamodb_connect class
Connection library for interacting with Dynamodb
timer class
Helper class to time long running processes
Basic Usage:
>>> from db_utils.timer import timer
>>>
>>> t = timer()
>>> t.lap('s')
5.469961
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file db_utils-0.4.6.tar.gz
.
File metadata
- Download URL: db_utils-0.4.6.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 825531a57d5a1bd76d9c096f27a388b1ab03e1de870510fbf2eea2234223c8ee |
|
MD5 | 9efcc717b9f71e943cce509b62fa330b |
|
BLAKE2b-256 | 085459af67000f5e00c9e8511afac1ac309e5353b6d68d2413a52f6657ed9f0b |
File details
Details for the file db_utils-0.4.6-py3-none-any.whl
.
File metadata
- Download URL: db_utils-0.4.6-py3-none-any.whl
- Upload date:
- Size: 24.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.6.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | efec1806a5406be6af804229dab406ae8acacce211c723006f147c2c3ca38151 |
|
MD5 | f6a0906dfd6996423569baf2b16702de |
|
BLAKE2b-256 | 2b76b3298e6092e536aeffec0dba91b4501643967dd4c469c9dd068b7d831878 |