A tool for exporting Pandas dataframes to Redshift tables
Project description
Pandas2Redshift
This is a utility library for uploading a Pandas DataFrame to Amazon Redshift table, utilizing AWS S3 for temporary storage.
Features
- Upload a Pandas DataFrame to a Redshift Table
- Uses the
COPY
command, using S3 as a middleware for fast inserts on Redshift - Can create the table for you based on a Dict containing the datatypes or generates it automatically based on the pandas datatypes of the dataframe
Installation
Install the package using pip:
pip install pandas2redshift
Usage
Insert Data into Redshift
Insert data from a DataFrame into a Redshift table:
import pandas as pd
from sqlalchemy import create_engine
import pandas2redshift as p2r
data = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
engine = create_engine('redshift+psycopg2://user:password@host:port/dbname')
with engine.connect() as conn:
p2r.insert(
data=data,
table_name='my_table',
schema='public',
conn=conn,
aws_access_key='YOUR_AWS_ACCESS_KEY',
aws_secret_key='YOUR_AWS_SECRET_KEY',
aws_bucket_name='YOUR_S3_BUCKET_NAME',
)
IMPORTANT: If your data does not appear in the target table after executing the example above, it may be because SQLAlchemy is not committing your operations to Redshift. To resolve this issue, create the engine as follows:
engine = (
create_engine('redshift+psycopg2://user:password@host:port/dbname')
.execution_options(
autocommit=True, isolation_level="AUTOCOMMIT"
)
)
You can enhance the functionality of the insert
function with several optional arguments:
ensure_exists (bool, optional)
: Checks if the schema and table you are inserting data into exist in the database. If they do not exist, it creates them. Defaults toFalse
.truncate_table (bool, optional)
: When set toTrue
, truncates the target table before inserting the data. Defaults to False.table_data_types (Dict[str, str], optional)
: A dictionary specifying column names and their data types for table creation. If not provided, it infers the data types based on pandas dtypes and the mapping defined in thepandas_to_redshift_datatypes
function. Defaults toNone
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pandas2redshift-0.0.5.tar.gz
.
File metadata
- Download URL: pandas2redshift-0.0.5.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 416b6025ffbc4f661990ac9226a588349b0c001a9c49c9a9885a8a7d05c0f751 |
|
MD5 | 88d1e8eb12d571baf306b5be305163f4 |
|
BLAKE2b-256 | 0bc8fea23ec48e5a1ecbdecff0c940a7cb62b8fed84befd7953500d0cd1b3b92 |
File details
Details for the file pandas2redshift-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: pandas2redshift-0.0.5-py3-none-any.whl
- Upload date:
- Size: 4.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d488460af9ae0ee265b8cbb5c00a9b3f5e0c87082db081c4074e15003b32007 |
|
MD5 | 7d7552b2a4cfaf4d6a507dd7d6b752e5 |
|
BLAKE2b-256 | f7fccac3c43db8e8487c64868eceb204f6bc0b96e7f868420614667e0b8665f8 |