Skip to main content

A Python module for constructing a decision tree from multidimensional training data and for using the decision tree for classifying unlabeled data

Project description

Version 1.7.1 fixes a bug triggered by certain comment words in a training data file. This version also includes additional safety checks that are useful for catching errors and inconsistencies in large training data files that do not lend themselves to manual checking for correctness. As an example, the new version makes sure that the number of values you declare in each sample record matches the number of features declared at the beginning of the training data file.

With regard to the purpose of the module, assuming you have arranged your training data in the form of a table in a text file, all you have to do is to supply the name of the training datafile to this module and it does the rest for you without much effort on your part. A decision tree classifier consists of feature tests that are arranged in the form of a tree. The feature test associated with the root node is one that can be expected to maximally disambiguate the different possible class labels for an unlabeled data record. From the root node hangs a set of child nodes, one for each value of the feature at the root node. At each such child node, a feature test is selected that is the most class discriminative given that you have already applied the feature test at the root node and observed the value for that feature. This process is continued until you reach the leaf nodes of the tree. The leaf nodes may either correspond to the maximum depth desired for the decision tree or to the case when you run out of features to test.

Typical usage syntax:

dt = DecisionTree( training_datafile = “training.dat” )

dt.get_training_data()

dt.show_training_data()

root_node = dt.construct_decision_tree_classifier()

root_node.display_decision_tree(” “)

test_sample = [‘exercising=>never’, ‘smoking=>heavy’,

‘fatIntake=>heavy’, ‘videoAddiction=>heavy’]

classification = dt.classify(root_node, test_sample)

print “Classification: “, classification

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page