Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.
Project description
mssql_dataframe
A data engineering package for Python pandas dataframes and Microsoft Transact-SQL. It provides more advanced methods for writting dataframes including update, merge, upsert.
- Update: updates records in SQL table
- Upsert: insert or update records in SQL table
- Merge: update, insert, or delete records in SQL table
These more advanced methods are designed to provide more funcationality than is offered by the pandas.DataFrame.to_sql method. pandas.DataFrame.to_sql offers 3 options if the SQL table already exists with the parameter if_exists={'fail', 'replace', 'append'}
.
See QUICKSTART for a full overview of functionality.
Initialization and Sample SQL Table
import env
import pandas as pd
from mssql_dataframe import SQLServer
# connect to database using pyodbc
sql = SQLServer(database=env.database, server=env.server)
Update
Records in an SQL table are updated by simply providing a dataframe. By default a match on the SQL table's primary key is required for a record to be updated.
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_update',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_update', df)
# update records
update_df = pd.DataFrame({
'Column1': ['A_Updated'],
}, index=pd.Index(['A'], name='PK'))
# update data in the SQL table
update_df = sql.write.update('##mssql_update', update_df)
# validate the result
result = sql.read.table('##mssql_update')
assert result.at['A', 'Column1'] == 'A_Updated'
assert result.at['A', 'Column2'] == 1
assert result.at['B', 'Column1'] == 'B_Initial'
assert result.at['B', 'Column2'] == 2
Merge
Records can be inserted/updated/deleted by providing a dataframe to the merge method. Again the primary key in the SQL table is used by default.
- dataframe column value doesn't match SQL column value -> insert record into SQL
- dataframe column value matches SQL column value -> update record in SQL
- SQL column value not in dataframe column -> delete record in SQL
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_merge',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_merge', df)
# perform merge
sql.write.merge(
'##mssql_merge',
pd.DataFrame.from_records([
{'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
{'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
]).set_index('PK')
)
# validate the results
result = sql.read.table('##mssql_merge')
assert 'A' not in result.index
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0
Upsert
The merge method can be restricted to not delete records in SQL by specifying the upsert flag. Records in SQL are then only inserted or updated.
# create demo SQL table
df = sql.create.table(
table_name = '##mssql_upsert',
columns = {'Column1': 'VARCHAR(10)', 'Column2': 'TINYINT', 'PK': 'CHAR(1)'},
primary_key_column = 'PK'
)
# create a demo dataframe
df = pd.DataFrame({
'Column1': ['A_Initial', 'B_Initial'],
'Column2': [1, 2],
}, index=pd.Index(['A', 'B'], name='PK'))
# perform an initial insert
sql.write.insert('##mssql_upsert', df)
# perform upsert
sql.write.merge(
'##mssql_upsert',
pd.DataFrame.from_records([
{'Column1': 'C_New', 'Column2': 3, 'PK': 'C'},
{'Column1': 'B_Updated', 'Column2': 0, 'PK': 'B'},
]).set_index('PK'),
upsert = True
)
# validate the results
result = sql.read.table('##mssql_upsert')
assert result.at['A', 'Column1'] == 'A_Initial'
assert result.at['A', 'Column2'] == 1
assert result.at['C', 'Column1'] == 'C_New'
assert result.at['B', 'Column1'] == 'B_Updated'
assert result.at['B', 'Column2'] == 0
Installation
pip install mssql-dataframe
Dependancies
pandas: Python DataFrames.
pyodbc: ODBC driver used for executing Transact-SQL statements.
Contributing
See CONTRIBUTING.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mssql_dataframe-3.2.0.tar.gz
.
File metadata
- Download URL: mssql_dataframe-3.2.0.tar.gz
- Upload date:
- Size: 30.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e56247321f1ac0640e1fbb9f6b0d428af715e41b92fda39e484cdc8547403c30 |
|
MD5 | 559fb0710069314526efbaaa2bb9b7e5 |
|
BLAKE2b-256 | 56ed14ae951a5a58eadefb374bb4415f0cf266fc92c9b89b1bcbbde1f229ba01 |
File details
Details for the file mssql_dataframe-3.2.0-py3-none-any.whl
.
File metadata
- Download URL: mssql_dataframe-3.2.0-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.10.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac8f1168c9e46f91857dcb65f9d4138dda5541a4ac44286be42c1e4d61253c42 |
|
MD5 | 4be62b4163fd08251dbceaa2b8a97347 |
|
BLAKE2b-256 | d6f8f397400225bf23ebd1bed859fec52076c1f06a4d002649b6a58361b54dd1 |