redshift-tool·PyPI

Elegant data load from Pandas to Redshift

These details have not been verified by PyPI

Project links

Homepage

Project description

redshift_tool - Elegant data load from Pandas to Redshift

Overview
Installation
Usages-Guidelines
Examples
Support
References

1.Overview

redshift_tool is a python package which is prepared for loading pandas data frame into redshift table. This package is making it easier for bulk uploads, where the procedure for uploading data consists in generating various CSV files, uploading them to an S3 bucket and then calling a copy command on the server, this package helps with all those tasks in encapsulated functions. There are two methods of data copy.

a). Append:- It simply copies the data or adds the data at the end of existing data in a redshift table.

b). Upsert:- It is used for updating the old record as per provided upsert Id/Ids and also copy the new records into the table.

redshift_tool is purely implemented in Python.

2.Installation

To install the library, use below command

$ pip install redshift_tool

3.Usages-Guidelines

Uses Commands

>>> import redshift_tool
>>>  redshift_tool.query(data,method,redshift_auth=None,s3_auth=None,schema=None,table=None,
                         primarykey=None,sortkey=None,distkey=None,upsertkey=None)

a). data:- It will take any pandas Data frame.

>>> data= df

b). method: There are two methods of writing pandas data frame as defined above either by ‘append’ or ‘upsert’.

>>> method='append/upsert'

c). redshift_auth:- To write the data into redshift, it is required to establish the redshift connection. It is the connection’s credential parameter.

>>> redshift_auth= {'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}

d). s3_auth:- AWS S3 is used to enhance the performance of the copy operation. It is used to pass the AWS S3 credentials as well S3 bucket name. S3 Bucket is a place where you can put your files temporary for coping into redshift tables.

>>> s3_auth = {'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}

e). schema:- If there is any schema name associated with your database in which table will be created.

>>> schema='Schema_name'

f). table:- Target table name to write pandas.

If target table is already exist, function will be used to copy/usert data into exiting table.

A user can proceed with two steps in case of target table is not exist.

2.1. Use SQL Create staatement to create taget table manualy.

2.2. This libaray can also create target table on the basis of input pandas dataframe columns and datatypes so before using the command make sure all the column names and datatypes of pandas dataframe set properly.
>>> table='table_name'

g). primarykey:- A primary key is a special relational database table column (or combination of columns) designated to uniquely identify all table records.

While creating the table by default, if it is required to define any column as primary key, then pass the column name in a tuple in this parameter.

>>> primarykey=('Primary_Key') or
>>> primarykey=('Primary_Key1','Primary_Key2')

h). sortkey:- A sortkey is a field or column that is used to sort the data. It can be a single key as well as multiple keys.

While creating the table by default, if we need to define any column as a sort key, then pass the column name in a tuple in this parameter.

>>> sortkey=('sort_key') or
>>> sortkey=('sort_key1','sort_key2')

i). distkey(Default - Even):- A distribution key is a column that is used to determine the parallel data processing task with all available redshift slices.

>>> distkey=('distribution_key')

j). upsertkey:- During the upsert method of data loading, we need to pass upsert key by which key old record will get updated & new will be added. It will be also added into a tuple.

>>> upsertkey=('upsertkey') or
>>> upsertkey=('upsertkey1','upsertkey2')

4.Examples

Append or Copy data without primarykey, sortkey, distributionkey

Eg.

>>> import redshift_tool
>>> df= pandas.DataFrame()
>>> redshift_tool.query(data=df,method='append',
    redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},
    s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},
    schema='shcema_name',table='redshift_table_name')

Append or Copy data with primarykey, sortkey, distributionkey

Eg.

>>> import redshift_tool
>>> df= pandas.DataFrame()
>>> redshift_tool.query(data=df,method='append',
    redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},
    s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},
    schema='shcema_name',table='redshift_table_name',primarykey=(''primarykey'),
    sortkey=('sortkey'),distkey=('distributionkey'))

Upsert data without primarykey, sortkey, distributionkey

Eg.

>>> import redshift_tool
>>> df= pandas.DataFrame()
>>> redshift_tool.query(data=df,method='append',
    redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},
    s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},
    schema='shcema_name',table='redshift_table_name',upsertkey=('upsertkey'))

Upsert data with primarykey, sortkey, distributionkey

Eg.

>>> import redshift_tool
>>> df= pandas.DataFrame()
>>> redshift_tool.query(data=df,method='append',
    redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'},
    s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'},
    schema='shcema_name',table='redshift_table_name',primarykey=('primarykey'),
    sortkey=('sortkey'),distkey=('distributionkey'),upsertkey=('upsertkey'))

5.Support

Operating System

Linux/OSX/Windows

Python Version

2/2.7/3/3.2/3.3/3.4/3.5/3.6/3.7 etc.

6.References

Many thanks to the developers of dependent packages. Please use the below links to get deeper knowledge about required packages:-

PANDAS: https://pypi.org/project/pandas/
NUMPY: https://pypi.org/project/numpy/
PSYCOPG2: https://pypi.org/project/psycopg2/
BOTO3: https://pypi.org/project/boto3/

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.4

Feb 12, 2019

0.3

Jan 9, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redshift_tool-0.4.tar.gz (5.7 kB view details)

Uploaded Feb 12, 2019 Source

Built Distribution

redshift_tool-0.4-py3-none-any.whl (6.4 kB view details)

Uploaded Feb 12, 2019 Python 3

File details

Details for the file redshift_tool-0.4.tar.gz.

File metadata

Download URL: redshift_tool-0.4.tar.gz
Upload date: Feb 12, 2019
Size: 5.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for redshift_tool-0.4.tar.gz
Algorithm	Hash digest
SHA256	`115d9b4f73a3f0bd979fb6ea8f06a796ed9cbd03c905295c7bc12becdac7fb83`
MD5	`ec83b36f1b34d29a3c19a63a2952e022`
BLAKE2b-256	`3295e2a68463b8a4e6e0cfef02a18ba832cbc3df55afafe022dea8730f0b7306`

See more details on using hashes here.

File details

Details for the file redshift_tool-0.4-py3-none-any.whl.

File metadata

Download URL: redshift_tool-0.4-py3-none-any.whl
Upload date: Feb 12, 2019
Size: 6.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1

File hashes

Hashes for redshift_tool-0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ef70b4d7220f94181ed97800d1ae1369037f9deae110b349c85828de15749e9c`
MD5	`cb6f4ee8e89783e24ad5615015ab201f`
BLAKE2b-256	`27b7a5001d029e3dfcb3e1270be1627bdf45f4845ed65e92b22718e6da0dc181`

See more details on using hashes here.

redshift-tool 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

redshift_tool - Elegant data load from Pandas to Redshift

1.Overview

2.Installation

3.Usages-Guidelines

4.Examples

Append or Copy data without primarykey, sortkey, distributionkey

Append or Copy data with primarykey, sortkey, distributionkey

Upsert data without primarykey, sortkey, distributionkey

Upsert data with primarykey, sortkey, distributionkey

5.Support

6.References

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Operating System	Linux/OSX/Windows
Python Version	2/2.7/3/3.2/3.3/3.4/3.5/3.6/3.7 etc.