Elegant data load from Pandas to Redshift
Project description
redshift_tool - Elegant data load from Pandas to Redshift
1.Overview
redshift_tool is a python package which is prepared for loading pandas data frame into redshift table. This package is making it easier for bulk uploads, where the procedure for uploading data consists in generating various CSV files, uploading them to an S3 bucket and then calling a copy command on the server, this package helps with all those tasks in encapsulated functions. There are two methods of data copy.
redshift_tool is purely implemented in Python.
2.Installation
3.Usages-Guidelines
- Uses Commands
>>> import redshift_tool >>> redshift_tool.query(data,method,redshift_auth=None,s3_auth=None,schema=None,table=None, primarykey=None,sortkey=None,distkey=None,upsertkey=None)
a). data:- It will take any pandas Data frame.
>>> data= df
>>> method='append/upsert'
>>> redshift_auth= {'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}
>>> s3_auth = {'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}
>>> schema='Schema_name'
If target table is already exist, function will be used to copy/usert data into exiting table.
A user can proceed with two steps in case of target table is not exist.
2.1. Use SQL Create staatement to create taget table manualy.
2.2. This libaray can also create target table on the basis of input pandas dataframe columns and datatypes so before using the command make sure all the column names and datatypes of pandas dataframe set properly.
>>> table='table_name'
While creating the table by default, if it is required to define any column as primary key, then pass the column name in a tuple in this parameter.
>>> primarykey=('Primary_Key') or >>> primarykey=('Primary_Key1','Primary_Key2')
While creating the table by default, if we need to define any column as a sort key, then pass the column name in a tuple in this parameter.
>>> sortkey=('sort_key') or >>> sortkey=('sort_key1','sort_key2')
>>> distkey=('distribution_key')
>>> upsertkey=('upsertkey') or >>> upsertkey=('upsertkey1','upsertkey2')
4.Examples
Append or Copy data without primarykey, sortkey, distributionkey
- Eg.
>>> import redshift_tool >>> df= pandas.DataFrame() >>> redshift_tool.query(data=df,method='append', redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}, s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}, schema='shcema_name',table='redshift_table_name')
Append or Copy data with primarykey, sortkey, distributionkey
- Eg.
>>> import redshift_tool >>> df= pandas.DataFrame() >>> redshift_tool.query(data=df,method='append', redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}, s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}, schema='shcema_name',table='redshift_table_name',primarykey=(''primarykey'), sortkey=('sortkey'),distkey=('distributionkey'))
Upsert data without primarykey, sortkey, distributionkey
- Eg.
>>> import redshift_tool >>> df= pandas.DataFrame() >>> redshift_tool.query(data=df,method='append', redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}, s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}, schema='shcema_name',table='redshift_table_name',upsertkey=('upsertkey'))
Upsert data with primarykey, sortkey, distributionkey
- Eg.
>>> import redshift_tool >>> df= pandas.DataFrame() >>> redshift_tool.query(data=df,method='append', redshift_auth={'db':'database_name','port':port,'user':'user','pswd':'password','host':'host'}, s3_auth={'accesskey':'aws_access_key','secretkey':'aws_secret_key','bucket':'s3_bucket_name'}, schema='shcema_name',table='redshift_table_name',primarykey=('primarykey'), sortkey=('sortkey'),distkey=('distributionkey'),upsertkey=('upsertkey'))
5.Support
Operating System
Linux/OSX/Windows
Python Version
2/2.7/3/3.2/3.3/3.4/3.5/3.6/3.7 etc.
6.References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file redshift_tool-0.4.tar.gz
.
File metadata
- Download URL: redshift_tool-0.4.tar.gz
- Upload date:
- Size: 5.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 115d9b4f73a3f0bd979fb6ea8f06a796ed9cbd03c905295c7bc12becdac7fb83 |
|
MD5 | ec83b36f1b34d29a3c19a63a2952e022 |
|
BLAKE2b-256 | 3295e2a68463b8a4e6e0cfef02a18ba832cbc3df55afafe022dea8730f0b7306 |
File details
Details for the file redshift_tool-0.4-py3-none-any.whl
.
File metadata
- Download URL: redshift_tool-0.4-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef70b4d7220f94181ed97800d1ae1369037f9deae110b349c85828de15749e9c |
|
MD5 | cb6f4ee8e89783e24ad5615015ab201f |
|
BLAKE2b-256 | 27b7a5001d029e3dfcb3e1270be1627bdf45f4845ed65e92b22718e6da0dc181 |