Skip to main content

An improved way of uploading pandas DataFrames to MS SQL server

Project description

fast_to_sql

Introduction

fast_to_SQL is an improved way to upload pandas dataframes to MS SQL server. The method borrows an idea from here, and turns it into a usable function. This function takes advantage of MS SQL server's multi-row insert ability. This can lead to MUCH faster speeds for uploading dataframes to SQL server (uploading a 10,000 row 5 column dataframe with pd.to_sql() took 517.97s, while uploading the same dataframe with fast_to_SQL took only 5.45s!).

The funciton also automatically preserves datatypes for: integer, float, string, boolean, and datetime64[ns] and converts them to SQL datatypes: int, float, varchar(255), bit, and datetime. Custom data types can also be set for a subset or all of the columns (see usage).

Installation

pip install fast_to_sql

Requirements

  • Written for Python 3.6+
  • Requires pandas, sqlalchemy, datetime

Example

from fast_to_SQL import fast_to_sql as fts

# Create a SQL Alchemy Engine to desired server/database
sqluser = "DOMAIN\USER"
sqlpass = "Password"
server = "some_server"
db = "some_DB"

engine = create_engine("mssql+pyodbc://{}:{}@{}/{}?driver=SQL+Server&trusted_connection=true"
                       .format(sqluser,sqlpass,server,db))

df = SomePandasDF

# Run main function
fts.to_sql_fast(df, 'DFName', engine, if_exists = 'append', series = False, custom = {'column1':varchar(500)}, temp = False)

Usage

Main function

fts.to_sql_fast(df, name, engine, if_exists = 'append', series = False, custom = None, temp = False)
  • df: pandas DataFrame to upload
  • name: String of desired name for the table in SQL server
  • engine: A SQL alchemy engine
  • if_exists: Option for what to do if the specified name already exists in the dataframe. If the dataframe does not exist a new one will be created. By default this option is set to 'append'
    • 'append': Appends the dataframe to the table if it already exists in SQL server.
    • 'fail': Purposely raises a FailError if the table already exists in SQL server.
    • 'replace': Drops the old table with the specified name, and creates a new one. Be careful with this option, it will completely delete a table with the specified name in SQL server.
  • series: By default this is set to False. Set to True if the DataFrame is a series (only has one column).
  • custom: A dictionary object with one or more of the column names being uploaded as the key, and a valid SQL data type as the value, this will override the default data type assigned to the column by the function.
    • Example: {'ColumnName':'varchar(1000)'}
  • temp: Either True if creating a local temporary table, or False (default) if not. If set to True the temporary table will be dropped after the connection is closed

Caveats

  • This has only been tested with Microsoft SQL Server 2016 and pyodbc This may not work for other SQL databases.
  • The larger the database, the smaller speed imrpovements you will most likely see. This means that a 100 column, 500,000 row table, may still take a while to upload. This is because multi-row insert can only do a max of 1000 rows at a time.

Credits

  • This package is based on an excellent article from here

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for fast-to-sql, version 1.1.3
Filename, size File type Python version Upload date Hashes
Filename, size fast_to_sql-1.1.3-py3-none-any.whl (6.9 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size fast-to-sql-1.1.3.tar.gz (5.8 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page