Skip to main content

Given a True beta vector and a predicted beta vector, computes a set of metrics such as true positive rate, recall, etc

Project description

metrics_computation

Do you usually deal with high dimensional regression datasets and you need to compute a bunch of metrics for the beta coefficients of your model?

Then metrics_computation is your package. We assume here that beta coefficients that are different than 0 are positive, while coefficients that are equal to zero are negative. This package can easily compute:

  • true_positive_rate: True positive rate, sensitivity or recall = true positive / real positive
  • false_negative_rate: False negative rate = false negative / real positive (observe thar TPR + FNR = 1)
  • true_negative_rate: True negative rate or specificity = true negative / real negative
  • false_positive_rate: False positive rate = false positive / real negative (TNR + FPR = 1)
  • precision: Precision = true positive / (true positive + false positive)
  • f_score: F-score = 2*(precision*recall)/(precision+recall)
  • correct_selection_rate: Correct selection rate = (true positive + true negative) / total number
  • count_number_positives: Count the number of positives in both the predicted and the real beta array
  • store_beta: Sparse storage as an index-value pair of both predicted and real beta array

Usage example

We first generate a synthetic dataset using the data_generation package available here.

import numpy as np
import data_generation as dgen

data_equal = dgen.EqualGroupSize(n_obs=5200, ro=0.2, error_distribution='student_t', 
                                 e_df=5, random_state=1, group_size=10, non_zero_groups=7, 
                                 non_zero_coef=5, num_groups=15)

x, y, beta, group_index = data_equal.data_generation().values()

This generates a synthetic dataset that includes 5000 observations and 15x10=150 variables, among which 7x5=35 are different than zero and 115 are zero. Once we have the dataset, we obtain a prediction using a penalized regression model from the asgl package.

import asgl
lambda1 = 10.0 ** np.arange(-3, 1.51, 0.1)
tvt_alasso = asgl.TVT(model='lm', penalization='alasso', lambda1=lambda1, parallel=True,
                      weight_technique='pca_pct', variability_pct=0.9, error_type='MSE', 
                      random_state=1, train_size=100, validate_size=100)
alasso_result = tvt_alasso.train_validate_test(x=x, y=y)

alasso_prediction_error = alasso_result['test_error']
alasso_betas = alasso_result['optimal_betas'][1:] # Remove intercept

Now we have the true_betas and the alasso_betas obtained using an adaptive lasso model. Lets compute the metrics available in metrics_computation and see how good is our model:

import metrics_computation as mc
metrics_object = mc.MetricsComputation(metrics=['true_positive_rate', 'false_negative_rate', 'true_negative_rate',
                                                'false_positive_rate', 'precision', 'f_score',
                                                'correct_selection_rate', 'beta_error', 'count_number_positives',
                                                'store_beta'])
metrics = metrics_object.fit(predicted_beta=alasso_betas, true_beta=beta)

And the results obtained are:

print(metrics)
{'true_positive_rate': 1.0,
 'false_negative_rate': 0.0,
 'true_negative_rate': 0.5043478260869565,
 'false_positive_rate': 0.4956521739130435,
 'precision': 0.3804347826086957,
 'f_score': 0.5511811023622047,
 'correct_selection_rate': 0.62,
 'beta_error': 3.3391816482267616,
 'count_number_positives': {'number_positives_predicted_beta': 92,
  'number_positives_true_beta': 35},
 'store_beta': {'predicted_beta': {'index_non_zero_predicted_beta': array([  0,   1,   2,   3,   4,   5,   7,  10,  11,  12,  13,  14,  17,
           20,  21,  22,  23,  24,  25,  30,  31,  32,  33,  34,  35,  36,
           37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  49,  50,
           51,  52,  53,  54,  56,  57,  59,  60,  61,  62,  63,  64,  65,
           66,  67,  69,  71,  76,  77,  78,  79,  80,  85,  86,  87,  88,
           90,  91,  95,  97, 102, 103, 107, 108, 112, 114, 115, 117, 119,
          124, 125, 126, 129, 130, 133, 135, 137, 139, 140, 142, 145, 147,
          148]),
   'value_non_zero_predicted_beta': array([ 0.79339995,  1.87522975,  3.13763985,  3.61334788,  4.51621746,
           0.31125623,  0.54875971,  0.87955032,  1.81445199,  2.31443026,
           4.10932758,  5.42481166,  0.41256546,  1.12469476,  1.58801805,
           2.94509453,  3.13194602,  5.03623722,  0.13976405,  1.01355276,
           2.17043622,  2.66209677,  4.31500118,  5.35810521, -0.04337581,
          -0.10444598,  0.14715516,  0.11823591,  0.39685933,  1.16270464,
           0.66003076,  2.67933492,  3.11439734,  4.99887818,  0.4943931 ,
           0.79173587, -0.15166737,  0.01333736,  1.3140838 ,  2.29604342,
           2.42913034,  4.28719575,  4.68948029, -0.19252128,  0.09013478,
           0.0471931 ,  0.39352183,  2.59684326,  3.2423719 ,  3.70960104,
           4.47649098,  0.26081206,  0.01503026, -0.01503932,  0.18010438,
           0.3148776 , -0.19312873,  0.15733734, -0.2337122 , -0.06246009,
           0.43419795,  0.38280452,  0.00820053, -0.01194388,  0.1596238 ,
          -0.00966318,  0.00836905,  0.07707487,  0.00829384, -0.6871737 ,
           0.00859946,  0.25547944,  0.03750622,  0.20261963, -0.07859221,
          -0.05090997, -0.03631807, -0.23408105, -0.24707326, -0.10432976,
           0.01815028, -0.38785341,  0.19690068, -0.53279602, -0.1147006 ,
           0.10160788,  0.08420481, -0.34810438,  0.39998714,  0.18704431,
           0.21155179, -0.34328259])},
  'true_beta': {'index_non_zero_true_beta': array([ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
          32, 33, 34, 40, 41, 42, 43, 44, 50, 51, 52, 53, 54, 60, 61, 62, 63,
          64]),
   'value_non_zero_true_beta': array([1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2.,
          3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4.,
          5.])}}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metrics_computation-0.0.2.tar.gz (7.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page