Information gain utilities

# info_gain

Implementation of information gain algorithm. There seems to be a debate about how the information gain metric is defined. Whether to use the Kullback-Leibler divergence or the Mutual information as an algorithm to define information gain. This implementation uses the information gain calculation as defined below:

## Information gain definitions

### Information gain calculation

Definition from information gain calculation (retrieved 2018-07-13). Let `Attr` be the set of all attributes and `Ex` the set of all training examples, `value(x, a)` with `x` in `Ex` defines the value of a specific example `x` for attribute `a` in `Attr`, `H` specifies the entropy. The `values(a)` function denotes the set of all possible values of attribute `a` in `Attr`. The information gain for an attribute `a` in `Attr` is defined as follows: ### Intrinsic value calculation

Definition from information gain calculation (retrieved 2018-07-13). ### Information gain ratio calculation

Definition from information gain calculation (retrieved 2018-07-13). ## Installation

To install the package via pip use:

``````pip install info_gain
``````

To clone the package from the git repository use:

``````git clone https://github.com/Thijsvanede/info_gain.git
``````

## Usage

Import the `info_gain` module with:

``````from info_gain import info_gain
``````

The imported module has supports three methods:

• `info_gain.info_gain(Ex, a)` to compute the information gain.
• `info_gain.intrinsic_value(Ex, a)` to compute the intrinsic value.
• `info_gain.info_gain_ratio(Ex, a)` to compute the information gain ratio.

### Example

```from info_gain import info_gain

# Example of color to indicate whether something is fruit or vegatable
produce = ['apple', 'apple', 'apple', 'strawberry', 'eggplant']
fruit   = [ True  ,  True  ,  True  ,  True       ,  False    ]
colour  = ['green', 'green', 'red'  , 'red'       , 'purple'  ]

ig  = info_gain.info_gain(fruit, colour)
iv  = info_gain.intrinsic_value(fruit, colour)
igr = info_gain.info_gain_ratio(fruit, colour)

print(ig, iv, igr)
```

## Project details

This version 1.0.1 1.0