Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.
Project description
mssql_dataframe
A data engineering package for Python pandas dataframes and Microsoft Transact-SQL. It provides more advanced methods for writting dataframes including update, merge, upsert.
- Update: updates records in SQL table
- Upsert: insert or update records in SQL table
- Merge: update, insert, or delete records in SQL table
These more advanced methods are designed to provide more funcationality than is offered by the pandas.DataFrame.to_sql method. pandas.DataFrame.to_sql offers 3 options if the SQL table already exists with the parameter if_exists={'fail', 'replace', 'append'}
.
See QUICKSTART for a full overview of functionality.
Initialization and Sample SQL Table
import env
import pandas as pd
from mssql_dataframe import SQLServer
# connect to database using pyodbc
sql = SQLServer(database=env.database, server=env.server)
# create a demonstration dataframe
df = pd.DataFrame({
'ColumnA': ['1','2','3','4','5'],
'ColumnB': ['a .','b!',' c','d','e'],
}, index=pd.Index([0, 1, 2, 3, 4], name='PK'))
# create the table using a dataframe
df = sql.create.table_from_dataframe(
table_name='##mssql_dataframe_readme',
dataframe = df,
primary_key = 'index'
)
Update
Records in an SQL table are updated by simply providing a dataframe. By default a match on the SQL table's primary key is required for a record to be updated.
# update records for index 0 & 1
update_df = pd.DataFrame({
'ColumnA': ['11','22'],
'ColumnB': ['A','B'],
}, index=pd.Index([0, 1], name='PK'))
# update data in the SQL table
update_df = sql.write.update('##mssql_dataframe_readme', update_df)
Merge
Records can be inserted/updated/deleted by providing a dataframe to the merge method. Again the primary key in the SQL table is used by default.
- dataframe column value doesn't match SQL column value -> insert record into SQL
- dataframe column value matches SQL column value -> update record in SQL
- SQL column value not in dataframe column -> delete record in SQL
# update existing record for index 0
# insert new record for index 5
# delete missing records for index 1,2,3,4
merge_df = pd.DataFrame({
'ColumnA': ['11','6'],
'ColumnB': ['aa','f'],
}, index=pd.Index([0, 6], name='PK'))
# merge data in the SQL table
merged_df = sql.write.merge('##mssql_dataframe_readme', merge_df)
Upsert
The merge method can be restricted to not delete records in SQL by specifying the upsert flag. Records in SQL are then only inserted or updated.
# update existing record for index 0
# insert new record for index 7
# records not in the dataframe but in SQL won't be deleted
upsert_df = pd.DataFrame({
'ColumnA': ['11','7'],
'ColumnB': ['AA','g'],
}, index=pd.Index([0, 7], name='PK'))
sql.write.merge('##mssql_dataframe_readme', upsert_df, upsert=True)
Installation
pip install mssql-dataframe
Dependancies
pandas: Python DataFrames.
pyodbc: ODBC driver used for executing Transact-SQL statements.
Contributing
See CONTRIBUTING.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mssql_dataframe-1.5.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c968f3a96ec59650a1207fcb8a538003522d62340b2dca66d9850aac4c45ca86 |
|
MD5 | 41abb9c5c848ddb3216cb14d753313a5 |
|
BLAKE2b-256 | 76490086d6bc4056196fa6e00bae36f48397868bc29b2fa4538840a3fedcf37e |