Skip to main content

A Python package for calculating information gain.

Project description

informationGain Library

This module helps you calculate Information Gain for both categorical and continuous features.


Class

infoGain.calculate(data, target, criteria="gini", fIndex=True)


Parameters

Parameter Type Description Default
data DataFrame Dataset in pandas DataFrame format Required
target String Output column (the target variable) Required
fIndex Boolean If True, assumes the first column is an index and skips it True
criteria String Splitting criteria: "gini" or "entropy" "gini"

Description

  • Calculates Information Gain for each feature/column in the dataset with respect to the target column.
  • Handles both categorical and continuous features:
    • For categorical features, splits data by unique values.
    • For continuous features, finds the best threshold (value) that gives maximum Information Gain.
  • Supports Gini Index or Entropy as the impurity metric.
  • Returns a DataFrame with:
    • Feature name
    • Best threshold (or 'none' for categorical)
    • Corresponding Information Gain

Returns

A pandas.DataFrame with columns:

  • 'feature': Name of the feature
  • 'threshold': Best threshold value (for continuous features), or 'none'
  • 'infogain': Calculated Information Gain

Example Usage

from informationGain import infoGain
import pandas as pd

# Load dataset
data = pd.read_csv('your_dataset.csv')

# Initialize
ig = infoGain()

# Calculate Information Gain
result = ig.calculate(data, target='Output', fIndex=True)

print(result)
# Example Output:
#     feature  threshold   infogain
# 0   outlook      none    0.246750
# 1      temp      72.5    0.029223
# 2  humidity      80.0    0.151836
# 3      wind      none    0.048127

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

informationgain-3.0.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

informationgain-3.0.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file informationgain-3.0.0.tar.gz.

File metadata

  • Download URL: informationgain-3.0.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.10

File hashes

Hashes for informationgain-3.0.0.tar.gz
Algorithm Hash digest
SHA256 b5ea63e8cd9e0bc520f9594471a549dd2f86d36ea729101f330beb78b852c576
MD5 6e209250cfdfacb5a6f6a4474a3a38e5
BLAKE2b-256 d9d5caac1d5dd1f7f869946b99ea67cc74818752c04cc12867463f66d7facb3d

See more details on using hashes here.

File details

Details for the file informationgain-3.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for informationgain-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b08d2c3d3b237b2a2209ffda47812518e6c233e19ddab380b63862023a53b919
MD5 f6a12d6b89bfdedb8eca62f72a40d79b
BLAKE2b-256 5a56ec1fc448b80b7a61b1561f52b271ccf191148839f2523ae48311902dff14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page