Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.
Project description
mssql_dataframe
Provides efficient mechanisms for updating and merging data into Transact-SQL tables from Python dataframes. This is accomplished by utilizing the fast_executemany feature of pyodbc to quickly insert into a source SQL temporary table, and then updating/merging into a target SQL table from that temporary table. Without taking into account network speed, 100,000 records can be updated/merged in a few seconds.
In practice this module may be useful for updating models, web scraping, or general data engineering tasks.
Samples
See EXAMPLES.md for full examples.
Initialization
import pandas as pd
from mssql_dataframe.connect import connect
from mssql_dataframe.collection import SQLServer
# # connect to database using pyodbc
db = connect(database_name='master', server_name='localhost')
# # initialize the main package
sql = SQLServer(db, adjust_sql_objects=True)
Updating SQL Table
UPDATE an SQL table using primary keys or other columns.
# UPDATE using dataframe's index and the SQL primary key
write.update('SomeSQLTable', dataframe[['ColumnA']])
# UPDATE using other columns by specifying match_columns
write.update('SomeSQLTable', dataframe[['ColumnA','ColumnB','ColumnC']],
match_columns=['ColumnB','ColumnC']
)
Merging/Upsert SQL Table
MERGE (insert/update/delete) an SQL table using primary keys or other columns. Can also perform a simplier UPSERT action.
## MERGE using dataframe's index and the SQL primary key
write.merge('SomeSQLTable', dataframe[['ColumnA','ColumnB']])
## MERGE using another column
write.merge('SomeSQLTable', dataframe[['ColumnA','ColumnB','ColumnC']],
match_columns=['ColumnC']
)
## UPSERT (if exists update, otherwise insert)
write.merge('SomeSQLTable', dataframe[['ColumnA']], delete_unmatched=False)
SQL Object Creation and Modification
SQL objects will be created/modified as needed if the main class is initialized with adjust_sql_objects=True
.
- Tables will be created if they do not exist.
- Column size will increase if needed, for example from TINYINT to BIGINT or VARCHAR(5) to VARCHAR(10).
Certain actions won't be taken even with adjust_sql_objects=True
to preserve integrity.
- A column won't change from NOT NULL automatically.
- Column data type won't change from number like (INT, NUMERIC, etc.) to character like (VARCHAR).
Internal time tracking columns will be added (in server time) where applicable if include_timestamps=True
, even if adjust_sql_objects=False
.
_time_insert
: a new record was inserted_time_update
: an existing record was updated
Dependancies
pandas: Python DataFrames.
pyodbc: ODBC driver used for executing Transact-SQL statements.
Installation
pip install mssql-dataframe
Contributing
See CONTRIBUTING.md
See Also
A similiar project is pangres, but doesn't include SQL Server / Transact-SQL. The primary motivation for creating a new project is differences in Transact-SQL syntax, specifically MERGE in T-SQL vs UPSERT in other SQL flavors.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mssql_dataframe-1.0.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f622dde2a14f7376d3422033b9eef500fdc393be9e7cdb0e5cb2b518299ab3e |
|
MD5 | 098e0fa41e689e6219bd347aa08b2bb6 |
|
BLAKE2b-256 | a4c25d640c6ee932bb5a552716100a5cbb5e4ac0d0f8cb2c589b63d60d97d544 |