Skip to main content

A tool for exporting Pandas dataframes to Redshift tables

Project description

Pandas2Redshift

This is a utility library for uploading a Pandas DataFrame to Amazon Redshift table, utilizing AWS S3 for temporary storage.

Features

  • Upload a Pandas DataFrame to a Redshift Table
  • Uses the COPY command, using S3 as a middleware for fast inserts on Redshift
  • Can create the table for you based on a Dict containing the datatypes or generates it automatically based on the pandas datatypes of the dataframe

Installation

Install the package using pip:

pip install pandas2redshift

Usage

Insert Data into Redshift

Insert data from a DataFrame into a Redshift table:

import pandas as pd
from sqlalchemy import create_engine
import pandas2redshift as p2r

data = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
engine = create_engine('redshift+psycopg2://user:password@host:port/dbname')

with engine.connect() as conn:
    p2r.insert(
        data=data,
        table_name='my_table',
        schema='public',
        conn=conn,
        aws_access_key='YOUR_AWS_ACCESS_KEY',
        aws_secret_key='YOUR_AWS_SECRET_KEY',
        aws_bucket_name='YOUR_S3_BUCKET_NAME',
    )

You can enhance the functionality of the insert function with several optional arguments:

  • ensure_exists (bool, optional): Checks if the schema and table you are inserting data into exist in the database. If they do not exist, it creates them. Defaults to False.
  • truncate_table (bool, optional): When set to True, truncates the target table before inserting the data. Defaults to False.
  • table_data_types (Dict[str, str], optional): A dictionary specifying column names and their data types for table creation. If not provided, it infers the data types based on pandas dtypes and the mapping defined in the pandas_to_redshift_datatypes function. Defaults to None.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas2redshift-0.0.4.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

pandas2redshift-0.0.4-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pandas2redshift-0.0.4.tar.gz.

File metadata

  • Download URL: pandas2redshift-0.0.4.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for pandas2redshift-0.0.4.tar.gz
Algorithm Hash digest
SHA256 d889b6d4712295ec4fbaa3145db4ccba080c546451e83503981a0839456f3e5f
MD5 295b8b750f5a08b6b08a88a6bd91479f
BLAKE2b-256 bd3ca41242a2893cb5620aa99c06b6eb3c9e5889698404b1f2cab2a88ee6c00a

See more details on using hashes here.

File details

Details for the file pandas2redshift-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas2redshift-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e5cadb14c8b9b90d6458036ca7b30a6757b7f18f1c0f3c2b217fe5ee7aebe3c4
MD5 9586a2e3a94462ae68cfdbf57968f566
BLAKE2b-256 3095d949782b978903fdc17c647860e473a315f8e13a1449741ca686783e6eac

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page