ID3 is a Machine Learning Decision Tree Algorithm that uses two methods to build the model. The two methods are Information Gain and Gini Index.
Project description
ID3 Decision Tree Algorithm
ID3 is a Machine Learning Decision Tree Algorithm that uses two methods to build the model. The two methods are Information Gain and Gini Index. This Model consists of only Information Gain. Next update will include Gini Index.
Installation
Install directly from my PyPi
pip install classic-ID3-DecisionTree
Or Clone the Repository and install
python3 setup.py install
Parameters
* X_train
The Training Set array consisting of Features.
* y_train
The Training Set array consisting of Outcome.
* dataset
The Entire DataSet.
Attributes
* information_gain(X_train, y_train)
Initialise the Information Gain class with training set.
* add_features(dataset, result_col_name)
Add the features to the model by sending the dataset. The model will fetch the column features. The second parameter is the column name of outcome array.
* decision_tree()
To build the decision tree
* predict(y_test)
Predict the Test Set Results
Documentation
1. Install the package
pip install classic-ID3-DecisionTree
2. Import the library
from classic_ID3_DecisionTree import information_gain
3. Create an object for Information Gain class
ig = information_gain(X_train, y_train)
4. Add Column Features to the model
ig.add_features(dataset, result_col_name)
5. Build the Decision Tree Model
ig.decision_tree()
5. Predict the Test Set Results
y_pred = ig.predict(X_test)
Example Code
0. Download the dataset
Download dataset from here
1. Import the dataset and Preprocess
- import numpy as np
- import matplotlib.pyplot as plt
- import pandas as pd
- dataset = pd.read_csv('house-votes-84.csv')
- rawdataset = pd.read_csv('house-votes-84.csv')
- party = {'republican':0, 'democrat':1}
- vote = {'y':1, 'n':0, '?':0}
- for col in dataset.columns:
- if col != 'party':
- dataset[col] = dataset[col].map(vote)
- dataset['party'] = dataset['party'].map(party)
- X = dataset.iloc[:, 1:17].values
- y = dataset.iloc[:, 0].values
- from sklearn.model_selection import KFold
- kf = KFold(n_splits=5)
- for train_index, test_index in kf.split(X,y):
- X_train, X_test = X[train_index], X[test_index]
- y_train, y_test = y[train_index], y[test_index]
2. Use the ID3 Library
- from ID3 import information_gain
- ig = information_gain(X_train, y_train)
- ig.add_features(dataset, 'party')
- print(ig.features)
- ig.decision_tree()
- y_pred = ig.predict(X_test)
Footnotes
You can find the code at my Github.
Connect with me on Social Media
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for classic_ID3_DecisionTree-1.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 834b1c6889437537a79716565ddf1a6b871ee926fa31d8bd4dc5c9c49925b649 |
|
MD5 | 39c95777c95c7cc0355ca4d701d255da |
|
BLAKE2b-256 | 34cbf644c5af3c11fa26c72aaa2e40b1f2c43150558c3f7506c20a7d701eba57 |
Hashes for classic_ID3_DecisionTree-1.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bdb79b1797f329fd631ec5cffe9758717398756e6d1036d0cab2732db49117d6 |
|
MD5 | 81d4b851fc44222af37db27574da755a |
|
BLAKE2b-256 | f70debdce8730da22224f8a67e30ddd16eab8ffed27eec6b259bd65affc2a66c |