Skip to main content

A tool for exporting Pandas dataframes to Redshift tables

Project description

Pandas2Redshift

This is a utility library for uploading a Pandas DataFrame to Amazon Redshift table, utilizing AWS S3 for temporary storage.

Features

  • Upload a Pandas DataFrame to a Redshift Table
  • Uses the COPY command, using S3 as a middleware for fast inserts on Redshift
  • Can create the table for you based on a Dict containing the datatypes or generates it automatically based on the pandas datatypes of the dataframe

Installation

Install the package using pip:

pip install pandas2redshift

Usage

Insert Data into Redshift

Insert data from a DataFrame into a Redshift table:

import pandas as pd
from sqlalchemy import create_engine
import pandas2redshift as p2r

data = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
engine = create_engine('redshift+psycopg2://user:password@host:port/dbname')

with engine.connect() as conn:
    p2r.insert(
        data=data,
        table_name='my_table',
        schema='public',
        conn=conn,
        aws_access_key='YOUR_AWS_ACCESS_KEY',
        aws_secret_key='YOUR_AWS_SECRET_KEY',
        aws_bucket_name='YOUR_S3_BUCKET_NAME',
    )

IMPORTANT: If your data does not appear in the target table after executing the example above, it may be because SQLAlchemy is not committing your operations to Redshift. To resolve this issue, create the engine as follows:

engine = (
    create_engine('redshift+psycopg2://user:password@host:port/dbname')
    .execution_options(
        autocommit=True, isolation_level="AUTOCOMMIT"
    )
)

You can enhance the functionality of the insert function with several optional arguments:

  • ensure_exists (bool, optional): Checks if the schema and table you are inserting data into exist in the database. If they do not exist, it creates them. Defaults to False.
  • truncate_table (bool, optional): When set to True, truncates the target table before inserting the data. Defaults to False.
  • table_data_types (Dict[str, str], optional): A dictionary specifying column names and their data types for table creation. If not provided, it infers the data types based on pandas dtypes and the mapping defined in the pandas_to_redshift_datatypes function. Defaults to None.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas2redshift-0.0.5.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

pandas2redshift-0.0.5-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file pandas2redshift-0.0.5.tar.gz.

File metadata

  • Download URL: pandas2redshift-0.0.5.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for pandas2redshift-0.0.5.tar.gz
Algorithm Hash digest
SHA256 416b6025ffbc4f661990ac9226a588349b0c001a9c49c9a9885a8a7d05c0f751
MD5 88d1e8eb12d571baf306b5be305163f4
BLAKE2b-256 0bc8fea23ec48e5a1ecbdecff0c940a7cb62b8fed84befd7953500d0cd1b3b92

See more details on using hashes here.

File details

Details for the file pandas2redshift-0.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas2redshift-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3d488460af9ae0ee265b8cbb5c00a9b3f5e0c87082db081c4074e15003b32007
MD5 7d7552b2a4cfaf4d6a507dd7d6b752e5
BLAKE2b-256 f7fccac3c43db8e8487c64868eceb204f6bc0b96e7f868420614667e0b8665f8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page