Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.
Project description
mssql_dataframe
A data engineering package for Python pandas dataframes and Microsoft Transact-SQL. It provides more advanced methods for writting dataframes including update, merge, upsert.
- Update: updates records in SQL table
- Upsert: insert or update records in SQL table
- Merge: update, insert, or delete records in SQL table
These more advanced methods are designed to provide more funcationality than is offered by the pandas.DataFrame.to_sql method. pandas.DataFrame.to_sql offers 3 options if the SQL table already exists with the parameter if_exists={'fail', 'replace', 'append'}
.
See QUICKSTART for a full overview of functionality.
Initialization and Sample SQL Table
import env
import pandas as pd
from mssql_dataframe import SQLServer
# connect to database using pyodbc
sql = SQLServer(database=env.database, server=env.server)
Update
Records in an SQL table are updated by simply providing a dataframe. By default a match on the SQL table's primary key is required for a record to be updated.
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_update',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_update', df)
# update records
update_df = pd.DataFrame({
'Column1': ['A_Updated'],
}, index=pd.Index(['A'], name='PK'))
# update data in the SQL table
update_df = sql.write.update('##mssql_update', update_df)
# validate the result
result = sql.read.table('##mssql_update')
assert result.at['A', 'Column1'] == 'A_Updated'
assert result.at['A', 'Column2'] == 1
assert result.at['B', 'Column1'] == 'B_Initial'
assert result.at['B', 'Column2'] == 2
Merge
Records can be inserted/updated/deleted by providing a dataframe to the merge method. Again the primary key in the SQL table is used by default.
- dataframe column value doesn't match SQL column value -> insert record into SQL
- dataframe column value matches SQL column value -> update record in SQL
- SQL column value not in dataframe column -> delete record in SQL
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_merge',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_merge', df)
# perform merge
sql.write.merge(
'##mssql_merge',
pd.DataFrame.from_records([
{'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
{'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
]).set_index('PK')
)
# validate the results
result = sql.read.table('##mssql_merge')
assert 'A' not in result.index
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0
Upsert
The merge method can be restricted to not delete records in SQL by specifying the upsert flag. Records in SQL are then only inserted or updated.
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_upsert',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_upsert', df)
# perform upsert
sql.write.merge(
'##mssql_upsert',
pd.DataFrame.from_records([
{'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
{'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
]).set_index('PK'),
upsert = True
)
# validate the results
result = sql.read.table('##mssql_upsert')
assert result.at['A', 'Column1'] == 'A_Initial'
assert result.at['A', 'Column2'] == 1
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0
Installation
pip install mssql-dataframe
Dependancies
pandas: Python DataFrames.
pyodbc: ODBC driver used for executing Transact-SQL statements.
Contributing
See CONTRIBUTING.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mssql_dataframe-3.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac8f1168c9e46f91857dcb65f9d4138dda5541a4ac44286be42c1e4d61253c42 |
|
MD5 | 4be62b4163fd08251dbceaa2b8a97347 |
|
BLAKE2b-256 | d6f8f397400225bf23ebd1bed859fec52076c1f06a4d002649b6a58361b54dd1 |