Python Utils for NorthGravity platform tasks
Project description
NORTH GRAVITY PYTHON UTILS
This document describes the North Gravity Python Utils package which enables users to use the NG platform tools and most common functions within their python scripts / tasks repositories.
The Python Utils can be used within:
-
a single python script that is ran thanks to the Python Runner task within a pipeline in the NG application
-
a single Jupyter Notebook that is ran thanks to the Jupyter Runner task within a pipeline in the NG application
-
an ensemble of python scripts that are part of a container, for a Task created by the user, used in a pipeline in the NG application
NG_Utils cover most frequently used functions for data/files handling purposes.
The scope of the NG_Utils:
-
Data Handler - splitting data based on train/test labels, handling dates in datasets
-
File Handler - downloading/uploading model-specific datasets from/to the data lake, handling datasets formats
-
Back Test - preparing short or extended backtest based on models results
How to install and set the package:
Install
pip3 install northgravity_utils==0.0.3
As the library is available from pip, it can be installed as a specific version within a Python Task from within requirements.txt just by adding:
northgravity_utils==0.0.3
The package relies on the requests library so, in the project, the user must install this library in the requirements.txt file.
pip3 install requests==2.27.1
Environment Variables
The package uses information from the environment variables. They are necessery for functionality of Nortygravity SDK package, used in Northgravity Utils Environment variables are automatically provided when running a script within a pipeline (as a Task or within the Python/Jupyter Runners). If running locally the script, users must set them in the project to be able to run the project locally.
Mandatory environment variables to set:
-
LOGIN → login received from NG
-
PASSWORD → password to log in. Credentials are used to generate the token so that each request is authenticated.
-
NG_API_ENDPOINT → the URL to the NG platform API (by default, the url is set to https://api.northgravity.com)
Data Handler
Import northgravity_utils and load DataHandler, FileHandler and BackTest
import northgravity_utils as ng_u
dth = ng_u.DataHandler()
dfh = ng_u.FileHandler()
dbt = ng_u.BackTest()
Train/Test/Val
Split the input data (Features and Target) sets based on 'Split_Labes' column
# Train/Validation/Test
tvt_dict = dth.train_test_split(features_df, target_df, val=True)
# Train/Test
tt_dict = dth.train_test_split(features_df, target_df, val=True)
Get date column
The function detects DateTime columns (even if it's set to index) and returned them as pd.Series (if one column was detected) or pd.Dateframe (if >1 DateTime columns were detected), parses them to datetime64[ns] format.
# Get date column
X_dataset = tvt_dict['Val'][0]
date_col = dth.get_date_col(X_dataset)
Shift the dates by the specified period
The function changes the dates in passed pd.DataFrame or pd.Series to the following date in a given frequency.
# Business Days
data_shifted_bd = dth.date_shift(date_col, period=2, freq='B')
# Week
data_shifted_w = dth.date_shift(date_col, period=1, freq='W')
# Quarter
data_shifted_q = dth_date_shift(date_col, period=1, freq='Q')
Shift the holiday dates based on uploaded calendar
The function changes the dates in passed pd.DataFrame/pd.Series/pd.DatetimeIndex to the following business days if there are holidays according the holiday_list.
# hol_list - holiday calendar list
# date_col - column with dates
data_shifted_calendar = dth.holiday_shift(date_col, hol_list)
Prepare index for forecast and test datasets
Gets the frequency of the target time index and shifts it by period value for test and forecast datasets preparation.
# Index for forecast and test
idx = dth.prep_prediction_date(target_df.index, period=3)
Prepare forecast date in DataPrep required format
The function prepares the forecast date in DataPrep required format based on passed pd.DataFrame/pd.Series/pd.DatetimeIndex.Passed input prep_target_idx should be the same as passed to the test dataset (already preprocessed by prep_prediction_date function).
# Date in DataPrep required format
# idx - dates
date_prep = dth.forecast_date_format(idx)
Model' score - regression
The function calculates the regression metrics based on the comparison of predicted and actual values. To calculate Mean Squared Error, Root Mean Squared Error, Mean Absolute Percentage Error and Mean Absolute Error the sklearn.metrics are used.
# Regression
actual = df_test_reg['Real']
predicted = df_test_reg['Predicted']
print(dth.get_scores_regression(actual, predicted))
Model's score - classification
The function calculates the classification metrics based on the comparison of predicted and actual values. To calculate Accuracy, Precision, Recall and F1 the sklearn.metrics are used.
# Classification
actual = df_test_class['Real']
predicted = df_test_class['Predicted']
print(dth.get_scores_classification(actual, predicted))
FileHandler
Download features and target datasets
Function to download 'Features Dataset'/'Target Dataset' from DataLake and convert it into pandas DataFrame.
# Features
features_df = dfh.features_download()
# Target
target_df = dfh.target_download()
Upload general models output
Functions to upload the model output to specified group on DataLake.
fh.forecast_upload(df_for, group_name, model_out_name)
fh.test_upload(df_test, group_name, model_out_name)
fh.drivers_upload(df_drv, group_name, model_out_name)
Convert NSCV file to Horizontal type file
Function to convert the ncsv type files into the horizontal type
## Convert NCSV file to Horizontal type file
print('NCSV type file: {}'.format(df_ncsv.tail(10)))
n2h = fh.ncsv2horizontal(df_ncsv)
print(n2h.tail(10))
print('NCSV converted to Horizontal type file')
n2h.to_csv(output_path + '/ncsv2horizontal_1.csv')
BackTests
Short BackTest
Function that create short backtest.
short_bt = bt.short_back_test(df=df_to_backtest, open_col='Price', trading_signal_col='Signal')
print('Short BackTest: {}'.format(short_bt))
Long Backtest
Function that create long backtest.
long_bt = bt.long_back_test(df=df_to_backtest, number_of_contracts=100, open_col='Price', trading_signal_col='Signal')
print('Long BackTest: {}'.format(long_bt))
Who do I talk to?
- Admin: NorthGravity info@northgravity.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for northgravity_utils-0.0.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdbe2a99dd047fd6f591288e1d406505330639481d807c1691d5dc946820c847 |
|
MD5 | c1b7b96f7fab87047bd2c6bf7fb82a65 |
|
BLAKE2b-256 | 796fa2e80878a22c3eaf0960b65426ff7d53c7c2075794138bf487802ae8cffa |