Skip to main content

Library for C4.5 Decision Tree Algorithm

Project description

C4.5 Decision Tree Classifier

Introduction

What is C4.5?

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan. C4.5 is an extension of Quinlan's earlier ID3 algorithm. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy. The training data is a set S = {s1, s2, s3, ... , sn} of already classified samples. Each sample si consists of a tuple (xi, ci) where xi is a vector of attributes and ci is the class. The algorithm recursively splits the set S into subsets Si using an attribute ai that maximizes the information gain (or minimizes the information entropy) of the resulting subsets. The splitting procedure stops when the algorithm has reached a predefined termination criterion, such as when all samples in the resulting subset Si belong to the same class, or when all attributes in the resulting subset Si have the same values. The resulting tree is then used to classify new unseen samples x by traversing the tree from root to leaf and assigning the class of the leaf to the sample.

Top attributes

In this library, for top attribute use gain ratio instead of information gain. Gain ratio is a modification of information gain that reduces its bias for attributes with a large number of distinct values. Gain ratio is defined as:

GainRatio(S, A) = Gain(S, A) / SplitInfo(S, A)

where

SplitInfo(S, A) = - sum ( |Sv| / |S| ) * log2 ( |Sv| / |S| )

and Sv is the subset of S for which attribute A has value v.

Requirements

  • Python 3.6 or above
  • Pandas
  • Numpy

Implementation

Installation

pip install c45-decision-tree

Train Model

For training model, you need to prepare data in pandas dataframe format. For train model you need to can call fit method with 2 parameters, first is data and second is target. For example:

from C45 import C45Classifier
import pandas as pd

data = pd.read_csv('data.csv')
X = data.drop(['target'], axis=1)
y = data['target']

model = C45Classifier()
model.fit(X, y)

Predict

For predict data, you can call predict method with 1 parameter, first is data. For example:

data_test = pd.read_csv('data_test.csv')
model.predict(data_test)

Evaluate

For evaluate model, you can call evaluate method with 2 parameters, first is data and second is target. For example:

data_test = pd.read_csv('data_test.csv')
X_test = data_test.drop(['target'], axis=1)
y_test = data_test['target']
model.evaluate(X_test, y_test)

Summary Model

For summary model, you can call summary method. For example:

model.summary()

Save and Load Model

For save you can use pickle library. For example:

import pickle

with open('model.pkl', 'wb') as f:
    pickle.dump(model, f)

model = pickle.load(open('model.pkl', 'rb'))

Draw Tree

For draw tree use library graphviz and must be installed in your computer. For example:

import graphviz
model.generate_tree_diagram(graphviz,"File Name")

Example output: Example Tree

Write Rules

For write rules, you can call write_rules method. For example:

model.write_rules()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

c45-decision-tree-1.0.2.tar.gz (5.7 kB view details)

Uploaded Source

Built Distribution

c45_decision_tree-1.0.2-py3-none-any.whl (5.8 kB view details)

Uploaded Python 3

File details

Details for the file c45-decision-tree-1.0.2.tar.gz.

File metadata

  • Download URL: c45-decision-tree-1.0.2.tar.gz
  • Upload date:
  • Size: 5.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for c45-decision-tree-1.0.2.tar.gz
Algorithm Hash digest
SHA256 13ce758e81318008be864892c360a73a7c9c8c999953d71c1c7ab45ed336df7e
MD5 b2fab4ca3c8a27532e97a9f64d98b835
BLAKE2b-256 c62c381798654c98ec6d3a729c9c3e37d0a4ffeecec35e3d74b009bfc230c839

See more details on using hashes here.

File details

Details for the file c45_decision_tree-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for c45_decision_tree-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 52746e8b8db41261029cd61f116a056eaf3d0abf3ced2f40fe2652ecc7952919
MD5 8fd826b11340de595ebb433f0b219280
BLAKE2b-256 cc7d61c03e1d1d11729e031f822757e0ea56980bf305059bf038f8fc66dd25a5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page