Skip to main content

Postgres insert update with pandas DataFrames.

Project description

CircleCI codecov

pangres

Postgres upsert with pandas DataFrames (ON CONFLICT DO NOTHING or ON CONFLICT DO UPDATE) with some additional optional features:

  1. Create columns in DataFrame to upsert that do not yet exist in the postgres database
  2. Alter column data types in postgres for empty columns that do not match the data types of the DataFrame to upsert.

IMPORTANT

Characters "(", ")" and "%" may cause issues in column names. The issue seems to be directly related to psycopg2 (Python library for interacting with PostgreSQL databases). There is an option in the main function pangres.pg_upsert to remove those characters automatically (set clean_column_names to True), see Usage.

Notes

This is a library I was using in production in private with very good results and decided to publish.

Ideally such features will be integrated into pandas since there is already a PR on the way) and I would like to give the option to add columns via another PR.

In the meantime pangres is here and I think the data type alteration for empty columns is probably not something for pandas.

There is also pandabase which does almost the same thing but my implementation is different.

Btw big thanks to pandabase and the sql part of pandas which helped a lot.

Installation

  1. Clone the repository
git clone https://github.com/ThibTrip/pangres.git
  1. Pip install the cloned repository. Your current working directory should be the directory of the cloned repository (where you see the file "setup.py").
pip install .

Usage

The index of the given DataFrame is used as primary key when creating a table using pandas_pg_upsert. Further details in the docstring of the function pg_upsert.

import pandas as pd
from sqlalchemy import create_engine
from pangres import pg_upsert

# configure schema, table_name and engine
schema = 'tests'
table_name = 'pg_upsert_test'
engine = create_engine('postgresql://user:password@localhost:5432/mydatabase')

# create/get data
df = pd.DataFrame({'profileid':[0,1],
                    'favorite_fruit':['banana','apple']})
df.set_index('profileid', inplace = True)

# create or update table
# if_exists = 'upsert_overwrite' makes a ON CONFLICT DO UPDATE
# if_exists = 'upsert_keep' makes a ON CONFLICT DO NOTHING
# this option does not matter for table creation
pg_upsert(engine=engine,
          df=df,
          schema=schema,
          table_name=table_name,
          if_exists='upsert_overwrite',
          # next option will remove ")", "(" and "%"
          # if those characters are present in the column names
          # as those characters may cause issues with psycopg2
          clean_column_names=True)

Contributing

Pull requests/issues are welcome.

Testing

Clone pangres then set your curent working directory to the root of the cloned repository folder.

Then use these commands:

# 1) Create and activate the build environment
conda env create -f environment.yml
conda activate pangres-dev
# 2) Install pangres in editable mode (changes are reflected upon reimporting)
pip install -e .
# 3) Run pytest (--cov=./pangres shows coverage only for pangres)
pytest pangres --cov=./pangres

Project details


Release history Release notifications

This version

1.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pangres, version 1.1
Filename, size File type Python version Upload date Hashes
Filename, size pangres-1.1.tar.gz (10.1 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page