Statistical toolkit aimed to help statisticians, data analysts, data scientists, bankers and other professionals to analyze financial data
Project description
Package ‘AFR’
Statistical toolkit aimed to help statisticians, data analysts, data scientists, bankers and other professionals to analyze financial data.
It was designed by the team of the Agency of the Republic of Kazakhstan for Regulation and Development of Financial Market (ARDFM).
AFR toolkit offers functions to upload, preliminary check, analyze data and regressions and interpret results.
Authors:
Timur Abilkassymov, the Advisor to the Chairperson of the ARDFM.
Alua Makhmetova, chief specialist of the Department of Banking Analytics and Stress Testing of the ARDFM.
Contact:
Alua Makhmetova
Copyright:
The Agency of the Republic of Kazakhstan for Regulation and Development of Financial Market. The link
Datasets
AFR has built-in datasets named macroKZ and finratKZ that were gathered by the ARDFM team. More details below.
finratKZ dataset
Dataset finratKZ was gathered during the supervisory procedure of the banking sector of Kazakhstan. ARDFM team analyzed financial statements of the corporate borrowers and calculated financial ratios.
The data was collected during regular supervisory asset quality review(AQR) procedure. During the AQR corporate borrowers were classified as default and standard (IFRS stage 1).
Dataset contains following data :
- Default - Dummy variable where 0 - standard(IFRS stage 1) borrower, 1 - default borrower
- Rev_gr - Revenue growth rate
- EBITDA_gr - EBITDA growth rate
- Cap_gr - Capital growth rate
- CR - Current ratio
- QR - Quick ratio
- Cash_ratio - Cash ratio
- WC_cycle - Working capital cycle
- DTA - Debt-to-assets
- DTE -Debt-to-equity
- LR - Leverage ratio (Total assets/Total equity)
- EBITDA_debt - EBITDA-to-debt
- IC - Interest coverage (Income statement)
- CTI - Cash-to-income
- IC_CF - Interest coverage (Cash flow statement)
- DCR - Debt coverage ratio (Cash flow from operations/Total debt)
- CFR - Cash flow to revenue
- CRA - Cash return on assets (Cash flow from operations/Total assets)
- CRE - Cash return on equity (Cash flow from operations/Total equity)
- ROA - Return on assets
- ROE - Return on equity
- NPM - Net profit margin
- GPM - Gross profit margin
- OPM - Operating profit margin
- RecT - Receivables turnover
- InvT - Inventory turnover
- PayT - Payables turnover
- TA - Total assets turnover
- FA - Fixed assets turnover
- WC - Working capital turnover
Example :
.. code-block:: python
import pandas as pd
finrat = pd.read_csv('./finratKZ.csv')
Reference : The Agency of the Republic of Kazakhstan for Regulation and Development of Financial Market.
macroKZ dataset
The dataset was gathered by the ARDFM based on Kazakhstan' official and public data from the Bureau of National Statistics.
The dataset contains 50 historic macroeconomic and 10 hypothetical financial data over 50 quarters of 2010-2022 period.
The macroKZ dataset will be updated periodically as the official statistical information is released.
Dataset contains following data :
- real_gdp Real GDP
- GDD_Agr_R Real gross value added Agriculture
- GDD_Min_R Real gross value added Mining
- GDD_Man_R Real gross value added Manufacture
- GDD_Elc_R Real gross value added Electricity
- GDD_Con_R Real gross value added Construction
- GDD_Trd_R Real gross value added Trade
- GDD_Trn_R Real gross value added Transportation
- GDD_Inf_R Real gross value added Information
- GDD_Est_R Real gross value added for Real estate
- GDD_R Real gross value added
- GDP_DEF GDP deflator
- Rincpop_q Real population average monthly income
- Rexppop_q Real population average monthly expenses
- Rwage_q Real population average monthly wage
- imp Import
- exp Export
- cpi Inflation
- realest_resed_prim Real price for estate in primary market
- realest_resed_sec Real price for estate in secondary market
- realest_comm Real price for commercial estate
- index_stock_weighted Change in stock value for traded companies
- ntrade_Agr Change in stock value for non-traded companies Agriculture
- ntrade_Min Change in stock value for non-traded companies Mining
- ntrade_Man Change in stock value for non-traded companies Manufacture
- ntrade_Elc Change in stock value for non-traded companies Electricity
- ntrade_Con Change in stock value for non-traded companies Construction
- ntrade_Trd Change in stock value for non-traded companies Trade
- ntrade_Trn Change in stock value for non-traded companies Transportation
- ntrade_Inf Change in stock value for non-traded companies Information
- fed_fund_rate Federal Funds Rate
- govsec_rate_kzt_3m Return on government securities in KZT, 3 m
- govsec_rate_kzt_1y Return on government securities in KZT, 1 year
- govsec_rate_kzt_7y Return on government securities in KZT, 7 years
- govsec_rate_kzt_10y Return on government securities in KZT, 10 years
- tonia_rate TONIA
- rate_kzt_mort_0y_1y Weighted average mortgage lending rate for new loans, less than a year
- rate_kzt_mort_1y_iy Weighted average mortgage lending rate for new loans, more than a year
- rate_kzt_corp_0y_1y Weighted average mortgage lending rate for new loans to non-financial organizations in KZT, less than a year
- rate_usd_corp_0y_1y Weighted average mortgage lending rate for new loans to non-financial organizations in CKB, less than a year
- rate_kzt_corp_1y_iy Weighted average mortgage lending rate for new loans to non-financial organizations in KZT, more than a year
- rate_usd_corp_1y_iy Weighted average mortgage lending rate for new loans to non-financial organizations in CKB, more than a year
- rate_kzt_indv_0y_1y Weighted average mortgage lending rate for consumer loans in KZT, less than a year
- rate_kzt_indv_1y_iy Weighted average mortgage lending rate for consumer loans in KZT, less than a year
- usdkzt USD KZT exchange rate
- eurkzt EUR KZT exchange rate
- rurkzt RUB KZT exchange rate
- poil Price for Brent
- realest_resed_prim_rus Real price for estate in primary market in Russia
- realest_resed_sec_rus Real price for estate in secondary market in Russia
- cred_portfolio credit portfolio
- coef_k1 k1 prudential coefficient
- coef_k3 k3 prudential coefficient
- provisions provisions
- percent_margin percent margin
- com_inc commissionary income
- com_exp commissionary expenses
- oper_inc operational income
- oth_inc other income
- DR default rate
Example :
.. code-block:: python
import pandas as pd
macro = pd.read_csv('./macroKZ.csv')
Reference : The Agency of the Republic of Kazakhstan for Regulation and Development of Financial Market.
Functions
- adjr2_score
Performs calculation of Adjusted R squared (Adj R2) for the given model.
Arguments:
model: OLS model
Result:
result: Adj R2 metrics.
Example:
print(adjr2_score(model))
- aic_score
Performs calculation of Akaike Information criteria(AIC) for the given model.
Arguments:
model: OLS model
Result:
result: AIC metrics.
Example:
print(aic_score(model))
- bic_score
Performs calculation of Akaike Information criteria(AIC) for the given model.
Arguments:
model: OLS model
Result:
result: AIC metrics.
Example:
print(bic_score(model))
- check_betas
Performs all possible subsets regression analysis for the given X and y with with an option to select criterion. Possible criteria are r2, AIC, BIC.
Arguments:
X: The predictor variables.
y: The response variable
criteria (str): The information criteria based on which
intercept (bool): Logical; whether to include intercept term in the models.
Result:
pandas DataFrame: A table of subsets number, predictor variables and beta coefficients for each subset model.
Example:
X = macro[['poil', 'cpi', 'usdkzt', 'GDP_DEF', 'exp', 'tonia_rate']]
y = macro['real_gdp']
check_betas(X, y, criterion = 'bic', intercept = False)
- checkdata
Preliminary check of dataset for missing values, numeric format, outliers.
Arguments:
dataset: name of the CSV file with a dataset for analysis for preliminary check
Result:
str: A conclusion of the preliminary analysis.
Example:
import pandas as pd
macro = pd.read_csv('./load/macroKZ.csv')
checkdata(macro)
- corsel
Correlation matrix for a dataset with an option to set a correlation threshold and option to correlation value as a number or boolean True/False.
Arguments:
data: pandas DataFrame or path to CSV file with a dataset for analysis for preliminary check
thrs (float): correlation threshold numeric value to use for filtering. Default is 0.65
value_type (str): type of correlation value as a "numeric" or "boolean" value. Default representation is numeric.
Result:
pd.DataFrame or boolean : A pair of value name and correlation of the correlation matrix based on the threshold. Type of data varies in accordance with a chosen value_type parameter.
Example:
data = macro[['poil', 'cpi', 'usdkzt', 'GDP_DEF', 'exp', 'GDD_Agr_R', 'rurkzt', 'tonia_rate', 'cred_portfolio', 'fed_fund_rate']]
corsel(data, thrs = 0.65, value_type = "boolean")
- dec_plot
The function depicts decomposition of regressors as a stacked barplot.
Arguments:
model: OLS linear regression model.
data(pandas.DataFrame): A dataset based on which the model was built.
Result:
plot : matplotlib figure.
Example:
model = ols('real_gdp ~ poil + cpi + usdkzt + imp', data = macro).fit()
dec_plot(model, macro)
- opt_size
Calculation of the number of observations necessary to generate the regression for a given number of regressors.
Arguments:
model: OLS linear regression model.
Result:
size (int) : Number of observations necessary to generate the model.
Example:
model = ols('real_gdp ~ poil + cpi + usdkzt + imp', data = macro).fit()
opt_size(model)
- pt_multi
Estimates the multi-year probability of default using the Pluto and Tasche model.
Arguments:
portfolio_distribution (numpy array): The distribution of the portfolio loss rate.
num_defaults (numpy array): The number of defaults observed in each portfolio over the num_years.
conf_level (float): The confidence level for the estimate.
num_years (int): Number of years of observations.
Result:
The estimated multi-year probability of default for each portfolio.
Example:
portfolio_distribution = np.array([10,20,30,40,10,20])
num_defaults = np.array([1, 2, 1, 0, 3, 2])
conf_level = 0.99
num_years = 3
pt_multi(portfolio_distribution, num_defaults, conf_level, num_years)
- pt_one
Estimates the one-year probability of default using the Pluto and Tasche model.
Arguments:
portfolio_dist (numpy array): The distribution of the portfolio loss rate.
num_defaults (numpy array): The number of defaults observed in each portfolio.
conf_level (float): The confidence level for the estimate.
Result:
The estimated one-year probability of default for each portfolio.
Example:
portfolio_distribution = np.array([10,20,30,40,10,20])
num_defaults = np.array([1, 2, 1, 0, 3, 2])
conf_level = 0.99
pt_one(portfolio_distribution, num_defaults, conf_level)
- reg_test
Tests for detecting violation of Gauss-Markov assumptions.
Arguments:
model : OLS linear regression model.
Result:
results (dict) : A dictionary containing the results of the four tests.
Example:
model = ols('real_gdp ~ poil + cpi + usdkzt + imp', data = macro).fit()
reg_test(model)
- regsel_f
Allows to select the best model based on stepwise forward regression analysis with all possible models. The best model is chosen according to the specified scoring.
Arguments:
X: independent/predictor variable(-s)
y: dependent/response variable
data (pandas.DataFrame): dataset
p_value (float): Variables with p-value less than {p_value} will enter into the model.
scoring (str): Statistical metrics used to estimate the best model. The default value is R squared, r2.
Other possible options for {scoring} are: 'aic' , 'bic', 'sbic', 'accuracy', 'r2', 'adjr2', 'explained_variance' and other.
Result:
results: The best model and the plot of the coefficients.
Example:
X = macro[['poil', 'cpi', 'usdkzt', 'GDP_DEF', 'exp', 'tonia_rate']]
y = macro['real_gdp']
regsel_f(X, y, macro, scoring = 'aic')
- sbic_score
Performs calculation of Schwarz Bayesian Information criteria(SBIC) for the given model.
Arguments:
model: OLS model
Result:
result: SBIC metrics.
Example:
model = ols('real_gdp ~ poil + cpi + usdkzt + imp', data = macro).fit()
print(sbic_score(model))
- vif_reg
Calculates the variation inflation factors of all predictors in regression models.
Arguments:
model: OLS linear regression model.
Result:
vif (pandas DataFrame): A Dataframe containing the vIF score for each independent factor.
Example:
model = ols('real_gdp ~ poil + cpi + usdkzt + imp', data = macro).fit()
vif_reg(model)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.