Skip to main content

Information gain utilities

Project description

info_gain

Implementation of information gain algorithm. There seems to be a debate about how the information gain metric is defined. Whether to use the Kullback-Leibler divergence or the Mutual information as an algorithm to define information gain. This implementation uses the information gain calculation as defined below:

Information gain definitions

Information gain calculation

Definition from information gain calculation (retrieved 2018-07-13). Let Attr be the set of all attributes and Ex the set of all training examples, value(x, a) with x in Ex defines the value of a specific example x for attribute a in Attr, H specifies the entropy. The values(a) function denotes the set of all possible values of attribute a in Attr. The information gain for an attribute a in Attr is defined as follows:

Information gain formula

Intrinsic value calculation

Definition from information gain calculation (retrieved 2018-07-13).

Intrinsic value calculation

Information gain ratio calculation

Definition from information gain calculation (retrieved 2018-07-13).

Intrinsic value calculation

Installation

To install the package via pip use:

pip install info_gain

To clone the package from the git repository use:

git clone https://github.com/Thijsvanede/info_gain.git

Usage

Import the info_gain module with:

from info_gain import info_gain

The imported module has supports three methods:

  • info_gain.info_gain(Ex, a) to compute the information gain.
  • info_gain.intrinsic_value(Ex, a) to compute the intrinsic value.
  • info_gain.info_gain_ratio(Ex, a) to compute the information gain ratio.

Example

from info_gain import info_gain

# Example of color to indicate whether something is fruit or vegatable
produce = ['apple', 'apple', 'apple', 'strawberry', 'eggplant']
fruit   = [ True  ,  True  ,  True  ,  True       ,  False    ]
colour  = ['green', 'green', 'red'  , 'red'       , 'purple'  ]

ig  = info_gain.info_gain(fruit, colour)
iv  = info_gain.intrinsic_value(fruit, colour)
igr = info_gain.info_gain_ratio(fruit, colour)

print(ig, iv, igr)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

info_gain-1.0.tar.gz (2.7 kB view hashes)

Uploaded Source

Built Distribution

info_gain-1.0-py3-none-any.whl (3.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page