Skip to main content

A Decision Tree Classifier.

Project description

First iteration of a decision tree classifier that can handle string variables.

Overview

  • This is an implementation of a Decision Tree Classifier for both numerical and categorical features. It is comprised of several classes:

    • DecisionNodeNumerical: Represents a numerical decision node, which holds a feature name, threshold, left and right children, info gain, and null direction.
    • DecisionNodeCategorical: Represents a categorical decision node, which holds a feature name, categories, children, info gain, and null category.
    • LeafNode: Represents a leaf node in the decision tree, which holds the final class value, the size of the samples, entropy, and Gini impurity.
    • DecisionTreeClassifier: Main class that implements the decision tree classifier. It has methods to fit the data, predict the class of unseen samples, calculate information gain, and split the data based on the best feature and threshold.
  • The DecisionTreeClassifier class contains methods for fitting the model to the input data, predicting the class labels for new data, and calculating information gain, entropy, and Gini impurity. The fit method builds the decision tree by recursively finding the best split for each node and splitting the data accordingly. The predict method traverses the decision tree for each input sample and returns the class label associated with the reached leaf node. The get_best_split method finds the best feature and threshold for each node by maximizing the information gain.

  • The tree can be built with a specified maximum depth and minimum sample leaf size. Additionally, the classifier can handle missing values in the input data by assigning them to a specified null direction or null category.

Example Usage

  • Load up a categorical data set that you want to test and create a dataframe for it. Create a Decision Tree Classifier:

    • classifer = DecisionTreeClassifier(max_depth, min_sample_leaf)
  • Fit the tree with training data:

    • classifier.fit(training_dataframe, Name of the target column)
  • See a visual of the tree:

    • classifier.show_tree()
  • Predict the target values with the testing dataframe excluding the target column:

    • classifier.predict(testing_dataframe with no target column)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DecisionTreeClassifier-0.0.7.tar.gz (12.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

DecisionTreeClassifier-0.0.7-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file DecisionTreeClassifier-0.0.7.tar.gz.

File metadata

  • Download URL: DecisionTreeClassifier-0.0.7.tar.gz
  • Upload date:
  • Size: 12.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for DecisionTreeClassifier-0.0.7.tar.gz
Algorithm Hash digest
SHA256 93d469ff3cbb3876059ac36fee8c2e051f95570f84c1ab12cb6a10ed3b3a91bc
MD5 cc5145b07604ec45cc86ce3cae42b83d
BLAKE2b-256 68a27b0f39f567331a14962d55d22a5bf98755e7819e9fad03817acab9ee2411

See more details on using hashes here.

File details

Details for the file DecisionTreeClassifier-0.0.7-py3-none-any.whl.

File metadata

File hashes

Hashes for DecisionTreeClassifier-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 92c13272b64fc0c9fdd89b90e0226728cfadf018951f168df45d7c664cf7ae2d
MD5 464c32eb4cf15d6e1ca3eb05526ae73a
BLAKE2b-256 588d577be51b9b3fcd4e479a21ac2e1d08cbcb2ac1204d635d7bef324fcdbab7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page