A package to execute Copy and Stored Procedure tasks on data
Project description
DataPipelineExecutor
This package contains a pipeline which executes two tasks: Copying data from source (Kusto Cluster DB) to the sink (SQL Sever DB), and Executing a simple SQL Stored Procedure on the copied data. These tasks are completely configurable and the pipeline has following key features:
- Restartability with configurability.
- Logging - on top of logging on console, you can pass a dedicated filepath as a parameter if you wish to store the logs. Otherwise a logger.log file will be created in the same directory and all logs will be stored there until deleted.
Installation
Use pip to install the package:
pip install DataPipelineExecutor
Run the following commands to install dependencies:
pip install numpy
pip install pandas
pip install azure-kusto-data
pip install azure-kusto-ingest
pip install pyodbc
pip install papermill
Usage
To execute the pipeline run the following:
import main from DataPipelineExecutor
main('config.txt', 'logger.log')
# OR
main('config.txt')
Note that logger file is optional and if no paramater is passed it will automatically be created in the same directory.
Configuration
Three config files are required for this pipeline to run:
- config: This contains parameters (refer to the format below) needed for task executions.
- source_config: This file contains parameters needed to establish connection to the source server. The path to this file is passed as a parameter in the primary_config file.
- sink_config: This file contains parameters needed to establish connection to the sink server. The path to this file is passed as a parameter in the primary_config file.
Format for the primary_config:
[Watermark]
Table_Name =
Column1 =
Column2 =
watermark_col_basis_name =
DateTime =
[CopyData]
SourceType =
SinkType =
SourceConfig =
SinkConfig =
SQLTable =
KustoTable =
Query =
BatchSize =
[TaskSet]
Sequence =
[StoredProcedure]
ProcedureName =
ParameterName =
ParameterType =
TargetColumn =
TargetValue =
[Notebook]
Path =
OutputPath =
Param1 =
Param2 =
Format for source_config:
[Kusto]
Cluster =
Database =
ClientID =
ClientSecret =
AuthorityID =
Format for sink_config:
[SQL]
Server =
Database =
Username =
Password =
Driver =
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for DataPipelineExecutor-0.0.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd7409a6d8f47362c29f4efd5c60a77495be4b49cef4db95219fd0a38c370c2b |
|
MD5 | 2562326a912b165aaef19c4647546759 |
|
BLAKE2b-256 | e337368dc668eb14410ed71be90bff0d6fb0c64e9c46a9461da973bbae65a3ca |
Hashes for DataPipelineExecutor-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5296f04af1d8244211807abf5b7b0f604019becc2a6085991c93056d06e1fd81 |
|
MD5 | 51dd5e1dd11b9b3bc8600181dd7d6e86 |
|
BLAKE2b-256 | aa60b59dc9fc2d0b23e6e71bb019e0dcb756a11a4814e1c499caa825fae6eaad |