Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Use Value Difference Metric to find distance between categorical features.

Project description

vdm3

Value difference metric was introduced in 1986 to provide an appropriate distance function for symbolic attributes. It is based on the idea that the goal of finding the distance is to find the right class by looking at the following conditional probabilities.

Then the distance is calculated by the Euclidean Distance or Manhattan Distance, for instance:

Install

pip install vdm3

Parameters:

ValueDifferenceMetric(X=X, y=y)
  • X: ndarray, DataFrame, Series
  • y: tuple, list, ndarray, Series

Usage

Consider the following example:

>>> columns = {
    'Gender':['F','F','F','M','F','F','F','F','M','F'],
    'Marital':['UN','S','M','M','S','M','M','S','D','M'],
    'Lead':['REF','INTINT','REF','INTINT','RADIO','REF','INTER','PPC','PPC','RADIO'],
    'PrevEd':['SOMECOLL','SOMECOLL','ASSOC','BACH','BACH','ASSOC','UN','SOMECOLL','BACH','SOMECOLL'],
    'Citizen':['US','US','US','US','US','ELNC','US','US','US','US']
      }

>>> X = pd.DataFrame(columns)
>>> y = np.array([0,0,1,0,0,0,0,0,0,1])

Initiate the example by:

>>> case = ValueDifferenceMetric(X=X,y=y)
>>> case.vdm_pairs_fit()

Get the vdm distance of two points by:

>>> point1 = ['F','D','INTER','ASSOC','ELNC']
>>> point2 = ['M', 'S', 'PPC', 'SOMECOLL', 'US']

>>> case.get_points_distance(point1=point1, point2=point2)
0.5905636562630361

Return 0 if two points are the same:

>>> case.get_points_distance(point1=point1, point2=point1)
0.0

Attributes

  • all_pairs
    • all vdm distance pairs from the class instances.

Methods

  • get_cond_prob(x=x,y=y)
    • return a dictionary contains the conditional probabilities of an input x array and y array.
  • vdm(x=x,y=y)
    • return a dictionary contains all the vdm pairs and the respective conditional probability of an input x array and y array.
  • vdm_pairs_fit()
    • fit vdm with the class instances.
  • get_points_distance(point1=point1,point2=point2)
    • return the distance of two points using the conditional probabilities that learned from the class instances.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for vdm3, version 0.1.7
Filename, size File type Python version Upload date Hashes
Filename, size vdm3-0.1.7.tar.gz (3.6 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page