A Python package for calculating information gain.
Project description
informationGain Library
This module helps you calculate Information Gain for both categorical and continuous features.
Class
infoGain.calculate(data, target, criteria="gini", fIndex=True)
Parameters
| Parameter | Type | Description | Default |
|---|---|---|---|
data |
DataFrame | Dataset in pandas DataFrame format | Required |
target |
String | Output column (the target variable) | Required |
fIndex |
Boolean | If True, assumes the first column is an index and skips it |
True |
criteria |
String | Splitting criteria: "gini" or "entropy" |
"gini" |
Description
- Calculates Information Gain for each feature/column in the dataset with respect to the target column.
- Handles both categorical and continuous features:
- For categorical features, splits data by unique values.
- For continuous features, finds the best threshold (value) that gives maximum Information Gain.
- Supports Gini Index or Entropy as the impurity metric.
- Returns a DataFrame with:
- Feature name
- Best threshold (or
'none'for categorical) - Corresponding Information Gain
Returns
A pandas.DataFrame with columns:
'feature': Name of the feature'threshold': Best threshold value (for continuous features), or'none''infogain': Calculated Information Gain
Example Usage
from informationGain import infoGain
import pandas as pd
# Load dataset
data = pd.read_csv('your_dataset.csv')
# Initialize
ig = infoGain()
# Calculate Information Gain
result = ig.calculate(data, target='Output', fIndex=True)
print(result)
# Example Output:
# feature threshold infogain
# 0 outlook none 0.246750
# 1 temp 72.5 0.029223
# 2 humidity 80.0 0.151836
# 3 wind none 0.048127
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
informationgain-3.0.0.tar.gz
(4.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file informationgain-3.0.0.tar.gz.
File metadata
- Download URL: informationgain-3.0.0.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b5ea63e8cd9e0bc520f9594471a549dd2f86d36ea729101f330beb78b852c576
|
|
| MD5 |
6e209250cfdfacb5a6f6a4474a3a38e5
|
|
| BLAKE2b-256 |
d9d5caac1d5dd1f7f869946b99ea67cc74818752c04cc12867463f66d7facb3d
|
File details
Details for the file informationgain-3.0.0-py3-none-any.whl.
File metadata
- Download URL: informationgain-3.0.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b08d2c3d3b237b2a2209ffda47812518e6c233e19ddab380b63862023a53b919
|
|
| MD5 |
f6a12d6b89bfdedb8eca62f72a40d79b
|
|
| BLAKE2b-256 |
5a56ec1fc448b80b7a61b1561f52b271ccf191148839f2523ae48311902dff14
|