Skip to main content

Given a True beta vector and a predicted beta vector, computes a set of metrics such as true positive rate, recall, etc

Project description

metrics_computation

Do you usually deal with high dimensional regression datasets and you need to compute a bunch of metrics for the beta coefficients of your model?

Then metrics_computation is your package. We assume here that beta coefficients that are different than 0 are positive, while coefficients that are equal to zero are negative. This package can easily compute:

  • true_positive_rate: True positive rate, sensitivity or recall = true positive / real positive
  • false_negative_rate: False negative rate = false negative / real positive (observe thar TPR + FNR = 1)
  • true_negative_rate: True negative rate or specificity = true negative / real negative
  • false_positive_rate: False positive rate = false positive / real negative (TNR + FPR = 1)
  • precision: Precision = true positive / (true positive + false positive)
  • f_score: F-score = 2*(precision*recall)/(precision+recall)
  • correct_selection_rate: Correct selection rate = (true positive + true negative) / total number
  • count_number_positives: Count the number of positives in both the predicted and the real beta array
  • store_beta: Sparse storage as an index-value pair of both predicted and real beta array

Usage example

We first generate a synthetic dataset using the data_generation package available here.

import numpy as np
import data_generation as dgen

data_equal = dgen.EqualGroupSize(n_obs=5200, ro=0.2, error_distribution='student_t', 
                                 e_df=5, random_state=1, group_size=10, non_zero_groups=7, 
                                 non_zero_coef=5, num_groups=15)

x, y, beta, group_index = data_equal.data_generation().values()

This generates a synthetic dataset that includes 5000 observations and 15x10=150 variables, among which 7x5=35 are different than zero and 115 are zero. Once we have the dataset, we obtain a prediction using a penalized regression model from the asgl package.

import asgl
lambda1 = 10.0 ** np.arange(-3, 1.51, 0.1)
tvt_alasso = asgl.TVT(model='lm', penalization='alasso', lambda1=lambda1, parallel=True,
                      weight_technique='pca_pct', variability_pct=0.9, error_type='MSE', 
                      random_state=1, train_size=100, validate_size=100)
alasso_result = tvt_alasso.train_validate_test(x=x, y=y)

alasso_prediction_error = alasso_result['test_error']
alasso_betas = alasso_result['optimal_betas'][1:] # Remove intercept

Now we have the true_betas and the alasso_betas obtained using an adaptive lasso model. Lets compute the metrics available in metrics_computation and see how good is our model:

import metrics_computation as mc
metrics_object = mc.MetricsComputation(metrics=['true_positive_rate', 'false_negative_rate', 'true_negative_rate',
                                                'false_positive_rate', 'precision', 'f_score',
                                                'correct_selection_rate', 'beta_error', 'count_number_positives',
                                                'store_beta'])
metrics = metrics_object.fit(predicted_beta=alasso_betas, true_beta=beta)

And the results obtained are:

print(metrics)
{'true_positive_rate': 1.0,
 'false_negative_rate': 0.0,
 'true_negative_rate': 0.5043478260869565,
 'false_positive_rate': 0.4956521739130435,
 'precision': 0.3804347826086957,
 'f_score': 0.5511811023622047,
 'correct_selection_rate': 0.62,
 'beta_error': 3.3391816482267616,
 'count_number_positives': {'number_positives_predicted_beta': 92,
  'number_positives_true_beta': 35},
 'store_beta': {'predicted_beta': {'index_non_zero_predicted_beta': array([  0,   1,   2,   3,   4,   5,   7,  10,  11,  12,  13,  14,  17,
           20,  21,  22,  23,  24,  25,  30,  31,  32,  33,  34,  35,  36,
           37,  38,  39,  40,  41,  42,  43,  44,  45,  46,  47,  49,  50,
           51,  52,  53,  54,  56,  57,  59,  60,  61,  62,  63,  64,  65,
           66,  67,  69,  71,  76,  77,  78,  79,  80,  85,  86,  87,  88,
           90,  91,  95,  97, 102, 103, 107, 108, 112, 114, 115, 117, 119,
          124, 125, 126, 129, 130, 133, 135, 137, 139, 140, 142, 145, 147,
          148]),
   'value_non_zero_predicted_beta': array([ 0.79339995,  1.87522975,  3.13763985,  3.61334788,  4.51621746,
           0.31125623,  0.54875971,  0.87955032,  1.81445199,  2.31443026,
           4.10932758,  5.42481166,  0.41256546,  1.12469476,  1.58801805,
           2.94509453,  3.13194602,  5.03623722,  0.13976405,  1.01355276,
           2.17043622,  2.66209677,  4.31500118,  5.35810521, -0.04337581,
          -0.10444598,  0.14715516,  0.11823591,  0.39685933,  1.16270464,
           0.66003076,  2.67933492,  3.11439734,  4.99887818,  0.4943931 ,
           0.79173587, -0.15166737,  0.01333736,  1.3140838 ,  2.29604342,
           2.42913034,  4.28719575,  4.68948029, -0.19252128,  0.09013478,
           0.0471931 ,  0.39352183,  2.59684326,  3.2423719 ,  3.70960104,
           4.47649098,  0.26081206,  0.01503026, -0.01503932,  0.18010438,
           0.3148776 , -0.19312873,  0.15733734, -0.2337122 , -0.06246009,
           0.43419795,  0.38280452,  0.00820053, -0.01194388,  0.1596238 ,
          -0.00966318,  0.00836905,  0.07707487,  0.00829384, -0.6871737 ,
           0.00859946,  0.25547944,  0.03750622,  0.20261963, -0.07859221,
          -0.05090997, -0.03631807, -0.23408105, -0.24707326, -0.10432976,
           0.01815028, -0.38785341,  0.19690068, -0.53279602, -0.1147006 ,
           0.10160788,  0.08420481, -0.34810438,  0.39998714,  0.18704431,
           0.21155179, -0.34328259])},
  'true_beta': {'index_non_zero_true_beta': array([ 0,  1,  2,  3,  4, 10, 11, 12, 13, 14, 20, 21, 22, 23, 24, 30, 31,
          32, 33, 34, 40, 41, 42, 43, 44, 50, 51, 52, 53, 54, 60, 61, 62, 63,
          64]),
   'value_non_zero_true_beta': array([1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2.,
          3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4., 5., 1., 2., 3., 4.,
          5.])}}}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metrics_computation-0.0.2.tar.gz (7.0 kB view details)

Uploaded Source

File details

Details for the file metrics_computation-0.0.2.tar.gz.

File metadata

  • Download URL: metrics_computation-0.0.2.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.8.3

File hashes

Hashes for metrics_computation-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3fb623c90d0059c1b070b427499a82fa5449c552c861752f0cc2ae10191a8b92
MD5 0ab45f779488af50a3432eda14102c2d
BLAKE2b-256 8f65c292cf890073e8b9846ee01430f30037bf35b014b42d7b30134726f309e6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page