Skip to main content

A tool for exporting Pandas dataframes to Redshift tables

Reason this release was yanked:

Do not use this version, it doesn't work

Project description

Pandas2Redshift

This is a utility library for uploading a Pandas DataFrame to Amazon Redshift table, utilizing AWS S3 for temporary storage.

Features

  • Upload a Pandas DataFrame to a Redshift Table
  • Uses the COPY command, using S3 as a middleware for fast inserts on Redshift
  • Can create the table for you based on a Dict containing the datatypes or generates it automatically based on the pandas datatypes of the dataframe

Installation

Install the package using pip:

pip install pandas2redshift

Usage

Insert Data into Redshift

Insert data from a DataFrame into a Redshift table:

import pandas as pd
from sqlalchemy import create_engine
import pandas2redshift as p2r

data = pd.DataFrame({'col1': [1, 2], 'col2': ['a', 'b']})
engine = create_engine('redshift+psycopg2://user:password@host:port/dbname')

with engine.connect() as conn:
    p2r.insert(
        data=data,
        table_name='my_table',
        schema='public',
        conn=conn,
        aws_access_key='YOUR_AWS_ACCESS_KEY',
        aws_secret_key='YOUR_AWS_SECRET_KEY',
        aws_bucket_name='YOUR_S3_BUCKET_NAME',
    )

You can enhance the functionality of the insert function with several optional arguments:

  • ensure_exists (bool, optional): Checks if the schema and table you are inserting data into exist in the database. If they do not exist, it creates them. Defaults to False.
  • truncate_table (bool, optional): When set to True, truncates the target table before inserting the data. Defaults to False.
  • table_data_types (Dict[str, str], optional): A dictionary specifying column names and their data types for table creation. If not provided, it infers the data types based on pandas dtypes and the mapping defined in the pandas_to_redshift_datatypes function. Defaults to None.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas2redshift-0.0.3.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

pandas2redshift-0.0.3-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pandas2redshift-0.0.3.tar.gz.

File metadata

  • Download URL: pandas2redshift-0.0.3.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.13

File hashes

Hashes for pandas2redshift-0.0.3.tar.gz
Algorithm Hash digest
SHA256 0d55f1bfcb527771c639afa7a58dcbae845da26f4046c9a5bbc59f0365ffe1e9
MD5 82e478b6e355309ba475ef3a6a228490
BLAKE2b-256 57e084c5f9cf1676491490f6604194c33658c9c53132abc32cd94867c44a943e

See more details on using hashes here.

File details

Details for the file pandas2redshift-0.0.3-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas2redshift-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d91b5aa5765d71747411f04da0098f334d2c4d5b03b08c19b2e849fb4976e56f
MD5 a0b7c26ab0a9505ea4dc70ce46f926e3
BLAKE2b-256 0fd525642182c3b70698329705c5006f57f0a2b2a25a6fd08d5f7047dc641e85

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page