Skip to main content

A Python module for decision-tree based classification of multidimensional data

Project description

Version 3.2.1 has a bugfix that was needed in one of the probability calculating functions.

Version 3.2.0 adds boosting capability to the decision tree module.

Version 3.0 adds bagging capability to the decision tree module. If you have a large enough training dataset, you can now construct multiple decision trees and have the final classification be based on a majority vote from all the trees. This can average out the noise in the classification process.

Version 2.3 gives the module a new capability — ability to introspect about the classification decisions at the nodes of the decision tree.

With regard to the purpose of the module, assuming you have placed your training data in a CSV file, all you have to do is to supply the name of the file to this module and it does the rest for you without much effort on your part for classifying a new data sample. A decision tree classifier consists of feature tests that are arranged in the form of a tree. The feature test associated with the root node is one that can be expected to maximally disambiguate the different possible class labels for a new data record. From the root node hangs a child node for each possible outcome of the feature test at the root. This maximal class-label disambiguation rule is applied at the child nodes recursively until you reach the leaf nodes. A leaf node may correspond either to the maximum depth desired for the decision tree or to the case when there is nothing further to gain by a feature test at the node.

Typical usage syntax:

training_datafile = "stage3cancer.csv"
dt = DecisionTree.DecisionTree(
                training_datafile = training_datafile,
                csv_class_column_index = 2,
                csv_columns_for_features = [3,4,5,6,7,8],
                entropy_threshold = 0.01,
                max_depth_desired = 8,
                symbolic_to_numeric_cardinality_threshold = 10,
     )

  dt.get_training_data()
  dt.calculate_first_order_probabilities()
  dt.calculate_class_priors()
  dt.show_training_data()
  root_node = dt.construct_decision_tree_classifier()
  root_node.display_decision_tree("   ")

  test_sample  = ['g2 = 4.2',
                  'grade = 2.3',
                  'gleason = 4',
                  'eet = 1.7',
                  'age = 55.0',
                  'ploidy = diploid']
  classification = dt.classify(root_node, test_sample)
  print "Classification: ", classification

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DecisionTree-3.2.1.tar.gz (283.5 kB view details)

Uploaded Source

File details

Details for the file DecisionTree-3.2.1.tar.gz.

File metadata

File hashes

Hashes for DecisionTree-3.2.1.tar.gz
Algorithm Hash digest
SHA256 07bb088c88f41ab2233061e7e3817eb5c133f2c460b5f692977d3600ea6db909
MD5 ac07a01d420716d42c7396ff56b050d9
BLAKE2b-256 fec94e8252dbd1cb7821ad4b57c76d1403ae7ddb7bd054c4e559aa43f35fa8a3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page