Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

A Python module for decision-tree based classification of multidimensional data

Project Description

Consult the module API page at

https://engineering.purdue.edu/kak/distDT/DecisionTree-3.4.3.html

for all information related to this module, including information regarding the latest changes to the code. The page at the URL shown above lists all of the module functionality you can invoke in your own code. That page also describes in great detail how you can use the boosting and the bagging capabilities of the module, and the capabilities allowed by the new RandomizedTreesForBigData class that was introduced in Version 3.3.0. Recent changes to the module allow you to tackle needle-in-a-haystack and big-data classification problems. The needle-in-a-haystack metaphor is useful when your training data is excessively dominated by just one class.

With regard to the basic purpose of the module, assuming you have placed your training data in a CSV file, all you have to do is to supply the name of the file to this module and it does the rest for you without much effort on your part for classifying a new data sample. A decision tree classifier consists of feature tests that are arranged in the form of a tree. The feature test associated with the root node is one that can be expected to maximally disambiguate the different possible class labels for a new data record. From the root node hangs a child node for each possible outcome of the feature test at the root. This maximal class-label disambiguation rule is applied at the child nodes recursively until you reach the leaf nodes. A leaf node may correspond either to the maximum depth desired for the decision tree or to the case when there is nothing further to gain by a feature test at the node.

Typical usage syntax:

training_datafile = "stage3cancer.csv"
dt = DecisionTree.DecisionTree(
                training_datafile = training_datafile,
                csv_class_column_index = 2,
                csv_columns_for_features = [3,4,5,6,7,8],
                entropy_threshold = 0.01,
                max_depth_desired = 8,
                symbolic_to_numeric_cardinality_threshold = 10,
     )

  dt.get_training_data()
  dt.calculate_first_order_probabilities()
  dt.calculate_class_priors()
  dt.show_training_data()
  root_node = dt.construct_decision_tree_classifier()
  root_node.display_decision_tree("   ")

  test_sample  = ['g2 = 4.2',
                  'grade = 2.3',
                  'gleason = 4',
                  'eet = 1.7',
                  'age = 55.0',
                  'ploidy = diploid']
  classification = dt.classify(root_node, test_sample)
  print "Classification: ", classification
Release History

Release History

This version
History Node

3.4.3

History Node

3.4.2

History Node

3.4.1

History Node

3.4.0

History Node

3.3.2

History Node

3.3.1

History Node

3.3.0

History Node

3.2.4

History Node

3.2.3

History Node

3.2.2

History Node

3.2.1

History Node

3.2.0

History Node

3.0.1

History Node

3.0

History Node

2.3.4

History Node

2.3.3

History Node

2.3.2

History Node

2.3.1

History Node

2.3

History Node

2.2.6

History Node

2.2.5

History Node

2.2.4

History Node

2.2.3

History Node

2.2.2

History Node

2.2.1

History Node

2.2

History Node

2.1

History Node

2.0

History Node

1.7.1

History Node

1.7

History Node

1.6.1

History Node

1.6

History Node

1.5

History Node

1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
DecisionTree-3.4.3.tar.gz (335.3 kB) Copy SHA256 Checksum SHA256 Source May 14, 2016

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting