Skip to main content

Python Utils for NorthGravity platform tasks

Project description

NORTH GRAVITY PYTHON UTILS

This document describes the North Gravity Python Utils package which enables users to use the NG platform tools and most common functions within their python scripts / tasks repositories.

The Python Utils can be used within:

  • a single python script that is ran thanks to the Python Runner task within a pipeline in the NG application

  • a single Jupyter Notebook that is ran thanks to the Jupyter Runner task within a pipeline in the NG application

  • an ensemble of python scripts that are part of a container, for a Task created by the user, used in a pipeline in the NG application

NG_Utils cover most frequently used functions for data/files handling purposes.

The scope of the NG_Utils:

  • Data Handler - splitting data based on train/test labels, handling dates in datasets

  • File Handler - downloading/uploading model-specific datasets from/to the data lake, handling datasets formats

  • Back Test - preparing short or extended backtest based on models results

How to install and set the package:

Install

pip3 install northgravity_utils==0.0.3

As the library is available from pip, it can be installed as a specific version within a Python Task from within requirements.txt just by adding:

northgravity_utils==0.0.3

The package relies on the requests library so, in the project, the user must install this library in the requirements.txt file.

pip3 install requests==2.27.1

Environment Variables

The package uses information from the environment variables. They are necessery for functionality of Nortygravity SDK package, used in Northgravity Utils Environment variables are automatically provided when running a script within a pipeline (as a Task or within the Python/Jupyter Runners). If running locally the script, users must set them in the project to be able to run the project locally.

Mandatory environment variables to set:

  • LOGIN → login received from NG

  • PASSWORD → password to log in. Credentials are used to generate the token so that each request is authenticated.

  • NG_API_ENDPOINT → the URL to the NG platform API (by default, the url is set to https://api.northgravity.com)


Data Handler

Import northgravity_utils and load DataHandler, FileHandler and BackTest

import northgravity_utils as ng_u

dth = ng_u.DataHandler()
dfh = ng_u.FileHandler()
dbt = ng_u.BackTest()

Train/Test/Val

Split the input data (Features and Target) sets based on 'Split_Labes' column

# Train/Validation/Test
tvt_dict = dth.train_test_split(features_df, target_df, val=True)

# Train/Test
tt_dict = dth.train_test_split(features_df, target_df, val=True)

Get date column

The function detects DateTime columns (even if it's set to index) and returned them as pd.Series (if one column was detected) or pd.Dateframe (if >1 DateTime columns were detected), parses them to datetime64[ns] format.

# Get date column
X_dataset = tvt_dict['Val'][0]
date_col = dth.get_date_col(X_dataset)

Shift the dates by the specified period

The function changes the dates in passed pd.DataFrame or pd.Series to the following date in a given frequency.

# Business Days
data_shifted_bd = dth.date_shift(date_col, period=2, freq='B')

# Week
data_shifted_w = dth.date_shift(date_col, period=1, freq='W')

# Quarter
data_shifted_q = dth_date_shift(date_col, period=1, freq='Q')

Shift the holiday dates based on uploaded calendar

The function changes the dates in passed pd.DataFrame/pd.Series/pd.DatetimeIndex to the following business days if there are holidays according the holiday_list.

# hol_list - holiday calendar list
# date_col - column with dates
data_shifted_calendar = dth.holiday_shift(date_col, hol_list)

Prepare index for forecast and test datasets

Gets the frequency of the target time index and shifts it by period value for test and forecast datasets preparation.

# Index for forecast and test
idx = dth.prep_prediction_date(target_df.index, period=3)

Prepare forecast date in DataPrep required format

The function prepares the forecast date in DataPrep required format based on passed pd.DataFrame/pd.Series/pd.DatetimeIndex.Passed input prep_target_idx should be the same as passed to the test dataset (already preprocessed by prep_prediction_date function).

# Date in DataPrep required format
# idx - dates
date_prep = dth.forecast_date_format(idx)

Model' score - regression

The function calculates the regression metrics based on the comparison of predicted and actual values. To calculate Mean Squared Error, Root Mean Squared Error, Mean Absolute Percentage Error and Mean Absolute Error the sklearn.metrics are used.

# Regression
actual = df_test_reg['Real']
predicted = df_test_reg['Predicted']
print(dth.get_scores_regression(actual, predicted))

Model's score - classification

The function calculates the classification metrics based on the comparison of predicted and actual values. To calculate Accuracy, Precision, Recall and F1 the sklearn.metrics are used.

# Classification
actual = df_test_class['Real']
predicted = df_test_class['Predicted']
print(dth.get_scores_classification(actual, predicted))

FileHandler

Download features and target datasets

Function to download 'Features Dataset'/'Target Dataset' from DataLake and convert it into pandas DataFrame.

# Features
features_df = dfh.features_download()

# Target
target_df = dfh.target_download()

Upload general models output

Functions to upload the model output to specified group on DataLake.

fh.forecast_upload(df_for, group_name, model_out_name)
fh.test_upload(df_test, group_name, model_out_name)
fh.drivers_upload(df_drv, group_name, model_out_name)

Convert NSCV file to Horizontal type file

Function to convert the ncsv type files into the horizontal type

## Convert NCSV file to Horizontal type file
print('NCSV type file: {}'.format(df_ncsv.tail(10)))
n2h = fh.ncsv2horizontal(df_ncsv)
print(n2h.tail(10))
print('NCSV converted to Horizontal type file')
n2h.to_csv(output_path + '/ncsv2horizontal_1.csv')

BackTests

Short BackTest

Function that create short backtest.

short_bt = bt.short_back_test(df=df_to_backtest, open_col='Price', trading_signal_col='Signal')
print('Short BackTest: {}'.format(short_bt))

Long Backtest

Function that create long backtest.

long_bt = bt.long_back_test(df=df_to_backtest, number_of_contracts=100, open_col='Price', trading_signal_col='Signal')
print('Long BackTest: {}'.format(long_bt))

Who do I talk to?

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

northgravity_utils-0.0.3.tar.gz (14.3 kB view hashes)

Uploaded Source

Built Distribution

northgravity_utils-0.0.3-py3-none-any.whl (14.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page