Skip to main content

Update, Upsert, and Merge from Python dataframes to SQL Server and Azure SQL database.

Project description

mssql_dataframe

Tests Status Coverage Status PyPI

Open in Visual Studio Code

Provides efficient mechanisms for updating and merging data into Transact-SQL tables from Python dataframes. This is accomplished by utilizing the fast_executemany feature of pyodbc to quickly insert into a source SQL temporary table, and then updating/merging into a target SQL table from that temporary table. Without taking into account network speed, 100,000 records can be updated/merged in a few seconds.

In practice this module may be useful for updating models, web scraping, or general data engineering tasks.

Samples

See EXAMPLES.md for full examples.

Initialization

import pandas as pd

from mssql_dataframe.connect import connect
from mssql_dataframe.collection import SQLServer

# # connect to database using pyodbc
db = connect(database_name='master', server_name='localhost')
# # initialize the main package
sql = SQLServer(db, adjust_sql_objects=True)

Updating SQL Table

UPDATE an SQL table using primary keys or other columns.

# UPDATE using dataframe's index and the SQL primary key
write.update('SomeSQLTable', dataframe[['ColumnA']])
# UPDATE using other columns by specifying match_columns
write.update('SomeSQLTable', dataframe[['ColumnA','ColumnB','ColumnC']], 
    match_columns=['ColumnB','ColumnC']
)

Merging/Upsert SQL Table

MERGE (insert/update/delete) an SQL table using primary keys or other columns. Can also perform a simplier UPSERT action.

## MERGE using dataframe's index and the SQL primary key
write.merge('SomeSQLTable', dataframe[['ColumnA','ColumnB']])
## MERGE using another column
write.merge('SomeSQLTable', dataframe[['ColumnA','ColumnB','ColumnC']], 
    match_columns=['ColumnC']
)
## UPSERT (if exists update, otherwise insert)
write.merge('SomeSQLTable', dataframe[['ColumnA']], delete_unmatched=False)

SQL Object Creation and Modification

SQL objects will be created/modified as needed if the main class is initialized with adjust_sql_objects=True.

  1. Tables will be created if they do not exist.
  2. Column size will increase if needed, for example from TINYINT to BIGINT or VARCHAR(5) to VARCHAR(10).

Certain actions won't be taken even with adjust_sql_objects=True to preserve integrity.

  1. A column won't change from NOT NULL automatically.
  2. Column data type won't change from number like (INT, NUMERIC, etc.) to character like (VARCHAR).

Internal time tracking columns will be added (in server time) where applicable if include_timestamps=True, even if adjust_sql_objects=False.

  1. _time_insert: a new record was inserted
  2. _time_update: an existing record was updated

Dependancies

pandas: Python DataFrames.

pyodbc: ODBC driver used for executing Transact-SQL statements.

Installation

pip install mssql-dataframe

Contributing

See CONTRIBUTING.md

See Also

A similiar project is pangres, but doesn't include SQL Server / Transact-SQL. The primary motivation for creating a new project is differences in Transact-SQL syntax, specifically MERGE in T-SQL vs UPSERT in other SQL flavors.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mssql_dataframe-1.0.6.tar.gz (22.5 kB view hashes)

Uploaded Source

Built Distribution

mssql_dataframe-1.0.6-py3-none-any.whl (48.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page