A library for authenticating AI course submissions at IUST
Project description
Multi-Node Categorical Decision Tree Classifier
Introduction
This project implements a multi-node categorical decision tree classifier that is compatible with scikit-learn. Unlike binary decision trees, this classifier is designed to work with categorical features and can have multiple branches at each node. The implementation is based on the MultiNodeCategoricalDecisionTree
class, which inherits from scikit-learn's BaseEstimator
and ClassifierMixin
.
The purpose of this assignment is to give students hands-on experience in implementing a decision tree algorithm from scratch while maintaining compatibility with a popular machine learning library.
Setting Up the Environment
To set up the project environment and install the necessary libraries, follow these steps:
-
Ensure you have Python 3.7 or higher installed on your system.
-
Open a terminal and navigate to the project directory.
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS and Linux:
source venv/bin/activate
- On Windows:
-
Install the required libraries:
pip install numpy pandas scikit-learn jupyter notebook nbconvert
-
To create a Jupyter notebook from the example script, run:
jupyter nbconvert --to notebook --execute decision_tree_example.py
Now you're ready to start working on the project!
How to Complete the Class
To complete the MultiNodeCategoricalDecisionTree
class, you need to implement several key methods. Here's a guide on how to approach each method:
-
_build_tree(self, X: np.ndarray, y: np.ndarray, depth: int = 0) -> Dict[str, Any]
:- This is the core method that recursively builds the decision tree.
- Implement the following logic:
a. Check if the maximum depth has been reached or if the number of samples is less than
min_samples_split
. b. If either condition is true, create a leaf node with the majority class. c. If not, find the best split using the_best_split
method. d. Create a decision node with the best feature and split point. e. Split the data and recursively build subtrees for each split. - Return a dictionary representing the node structure.
-
_best_split(self, X: np.ndarray, y: np.ndarray) -> Dict[str, Any]
:- Implement the logic to find the best feature and split point for a given node.
- For each feature: a. Find unique values in the feature. b. For each unique value, calculate the information gain or Gini impurity. c. Keep track of the split that results in the highest information gain.
- Return a dictionary containing the best feature, split point, and related information.
-
_calculate_feature_importances(self) -> np.ndarray
:- Traverse the tree and calculate feature importances based on the reduction in impurity at each split.
- Normalize the importances so they sum to 1.
- Return an array of feature importances.
-
_predict_single(self, x: np.ndarray) -> Any
:- Implement the logic to traverse the tree for a single sample and return the predicted class.
- Start at the root node and follow the appropriate branch based on the feature values until reaching a leaf node.
- Return the majority class of the leaf node.
-
_predict_proba_single(self, x: np.ndarray) -> np.ndarray
:- Similar to
_predict_single
, but instead of returning the majority class, return the class probabilities. - The probabilities should be based on the distribution of classes in the leaf node.
- Return an array of probabilities for each class.
- Similar to
Additional Tips:
- Use numpy operations for efficiency whenever possible.
- Make sure to handle edge cases, such as empty nodes or features with only one unique value.
- Consider adding helper methods for calculating impurity (e.g., Gini impurity or entropy) and for splitting the data.
- Test your implementation thoroughly with different datasets and compare results with scikit-learn's DecisionTreeClassifier.
By completing these methods, you will have a fully functional multi-node categorical decision tree classifier that can be used with scikit-learn's cross-validation and evaluation tools.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file iust_ai-0.1.0.tar.gz
.
File metadata
- Download URL: iust_ai-0.1.0.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-45-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 19210cb9aa0cda88f9d43dba590a09e506f474cc41de9c93bea3381368d7642b |
|
MD5 | 68812378998ecf04c9e36b0f28462e69 |
|
BLAKE2b-256 | 3e2a376dd9c5925f063c33fdf553421d994f97246b2ce8d760f18d0eb9b79d84 |
File details
Details for the file iust_ai-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: iust_ai-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.12.3 Linux/6.8.0-45-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8ffd041e07b36d5336d0013c8f6e32a3fd070118162123e80e7a429832ddaa6 |
|
MD5 | 82a99ad80a37eea4d48e86e542b6327e |
|
BLAKE2b-256 | 8075f3759fb4edaca118eaf64fd241a454c8bcc5644b140061802dd5f30703ea |