A tool for exporting Pandas dataframes to Redshift tables
Project description
Pandas2Redshift
This is a utility library for uploading a Pandas DataFrame to Amazon Redshift table, utilizing AWS S3 for temporary storage.
Features
- Upload a Pandas DataFrame to a Redshift Table
- Uses the
COPY
command, using S3 as a middleware for fast inserts on Redshift - Can create the table for you based on a Dict containing the datatypes or generates it automatically based on the pandas datatypes of the dataframe
Installation
Install the package using pip:
pip install pandas2redshift
Usage
Insert Data into Redshift
Insert data from a DataFrame into a Redshift table:
import pandas as pd
from sqlalchemy import create_engine
import pandas2redshift as p2r
data = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
engine = create_engine('redshift+psycopg2://user:password@host:port/dbname')
with engine.connect() as conn:
p2r.insert(
data=data,
table_name='my_table',
schema='public',
conn=conn,
aws_access_key='YOUR_AWS_ACCESS_KEY',
aws_secret_key='YOUR_AWS_SECRET_KEY',
aws_bucket_name='YOUR_S3_BUCKET_NAME',
)
You can enhance the functionality of the insert
function with several optional arguments:
ensure_exists (bool, optional)
: Checks if the schema and table you are inserting data into exist in the database. If they do not exist, it creates them. Defaults toFalse
.truncate_table (bool, optional)
: When set toTrue
, truncates the target table before inserting the data. Defaults to False.table_data_types (Dict[str, str], optional)
: A dictionary specifying column names and their data types for table creation. If not provided, it infers the data types based on pandas dtypes and the mapping defined in thepandas_to_redshift_datatypes
function. Defaults toNone
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandas2redshift-0.0.4.tar.gz
(4.2 kB
view details)
Built Distribution
File details
Details for the file pandas2redshift-0.0.4.tar.gz
.
File metadata
- Download URL: pandas2redshift-0.0.4.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d889b6d4712295ec4fbaa3145db4ccba080c546451e83503981a0839456f3e5f |
|
MD5 | 295b8b750f5a08b6b08a88a6bd91479f |
|
BLAKE2b-256 | bd3ca41242a2893cb5620aa99c06b6eb3c9e5889698404b1f2cab2a88ee6c00a |
File details
Details for the file pandas2redshift-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: pandas2redshift-0.0.4-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5cadb14c8b9b90d6458036ca7b30a6757b7f18f1c0f3c2b217fe5ee7aebe3c4 |
|
MD5 | 9586a2e3a94462ae68cfdbf57968f566 |
|
BLAKE2b-256 | 3095d949782b978903fdc17c647860e473a315f8e13a1449741ca686783e6eac |