Skip to main content

Simple ETL tools for SQL Server

Project description

databarge | simple ETL tools for SQL Server

About

This package includes but is not limited to:

  1. A class to create a connection to SQL Server.
  2. A class to transfer data from one SQL Server to another.
  3. Functionality to perform basic data transformations.
  4. Classes to transfer data between SQL Server and text files.
  5. Functions to execute code as well as trap and log errors.
  6. Functions to execute local SQL files.
  7. Functions to update dimension tables.

Key notes:

  1. The data is transfered in chunks of 10,000 to optimise memory usage
  2. Enabling logging will produce a file that records each chunk of data transfered.

Dependencies

python 3.8 is required for this package

Required Packages

Package Version License
pandas 1.3.5 OSI Approved :: BSD License
SQLAlchemy 1.4.27 MIT License (MIT)

Config

Create and populate a local config_params.ini file

Windows authentication example:

[SOMEUSERFRIENDLYSERVERNAME]
platform=sql_server
server=SOMESERVERNAME
database=SOMEDATABASE
authentication=windows

Server authentication example:

[SOMEOTHERUSERFRIENDLYSERVERNAME]
platform=sql_server
server=SOMESERVERNAME
database=SOMEDATABASE
authentication=server
uid=SOMEGENERICUSERNAME
pwd=SOMEGENERICPASSWORD

ETL

Define the parameters and create the connections

# import modules
import sqlalchemy

# import objects
from databarge import SqlServerConnection, Etl

# define mandatory generic variables
config_params_path = r'xxx\config_params.ini'

# define optional generic variables
log_path = r'xxx\log.log'

# make connections
source_connection = SqlServerConnection('MSSQLSVRA', config_params_path)
destination_connection = SqlServerConnection('MSSQLSVRB', config_params_path)

Create an ETL class

Positional arguments:

# define positional etl class variables
source_sql = r'''SELECT * FROM TESTDB.dbo.tbl_test'''
destination_database = 'TESTDB'
destination_table = 'tbl_test'

# define optional etl class variables
xforms = [
    "df['test_id'] = df['test_id'].astype(str)"
    , "df['test_value'] = df['test_quantity'] * df['test_rate']"
    , "df = df.drop(['test_quantity','test_rate'], axis = 1, inplace=True)"
    ]
dtypes = {'test_text':sqlalchemy.types.NVARCHAR(length=100)}
# destination_schema = 'someschema'

# create etl class
etl_1 = Etl(source_sql, destination_database, destination_table, source_connection, destination_connection
    , xforms = xforms
    , dtypes = dtypes
    # , destination_schema=destination_schema
    , log_path=log_path)

Create other ETL classes and put them all in a list

# create other etl classes as required

# create a list of etl classes
xfers = [
    etl_1
    # , etl_2
    ]

Execute the ETL classes

# iterate through the etl list and execute the etl classes
for xf in xfers:
    
    # either drop or truncate the destination table
    xf.drop_table()
    # xf.truncate_table()
    
    # transfer the data
    xf.transfer_data()

Disclaimer

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

databarge-2.0.3.tar.gz (9.8 kB view hashes)

Uploaded Source

Built Distribution

databarge-2.0.3-py3-none-any.whl (9.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page