decisiontree-lite

A custom implementation of decision tree classifier.

These details have not been verified by PyPI

Project links

Homepage

Project description

Decision Tree Implementation from Scratch

Overview

This package provides a Python implementation of a decision tree algorithm from scratch, along with training and evaluation utilities. This lightweight implementation has performance comparable to scikit-learn's implementation.

Files

DecisionTree.py: Contains the core classes and functions for building and using the decision tree model.
utils.py: Provides helper functions for loading datasets, splitting data, calculating evaluation metrics, and comparing custom and scikit-learn implementations.

Key Features

Custom implementation of a decision tree algorithm.
Supports both Gini impurity and entropy criteria for splitting.
Options to control maximum depth and minimum samples for splitting.
Functions for loading and processing various inbuilt datasets.
Comparison with scikit-learn's DecisionTreeClassifier in terms of accuracy, precision, recall, F1-score, time taken, and memory usage.

Usage

Make sure the labels are numeric(Use label encoder).

from decisiontree_lite.decisiontree import DecisionTree
from decisiontree_lite.utils import load_data, train_test, model_metrics

# You can use one of inbuilt datasets('iris','wine','rice','raisin') or an external dataset of your choice  
X, y = load_data('iris')

# Perform test train split
test_split_size = 0.3
X_train, X_test, y_train, y_test = train_test(X, y, test_split_size)

clf = DecisionTree()
clf.fit(X_train, y_train)
y_pred = clf.predict(X_test)
accuracy, precision, recall, f1score = model_metrics(y_test, y_pred)
print("Accuracy: {},\nPrecision: {},\nRecall: {},\nF1-score: {}".format(accuracy, precision, recall, f1score))

Customization

Adjust hyperparameters like max_depth, min_samples_split, and criteria in the DecisionTree to tune the model.

max_depth: The maximum depth of the tree [1, 2, 4, 8, 16, 32, 64, 100].
min_samples_split: The minimum number of samples required to split an internal node [2, 50, 100].
criteria: To split based on gini or entropy impurity ['gini', 'entropy'].

clf = DecisionTree(max_depth  = maxDepth, min_samples_split = minSplit, criteria = impCriteria)

Additional Notes

The code is structured for clarity and modularity.
Comments are included to explain key concepts and steps.
Feel free to contribute to this project or use it for your own learning and experimentation!

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.0.2

Feb 15, 2024

This version

0.0.1

Feb 15, 2024

0.0.0

Feb 15, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

decisiontree_lite-0.0.1.tar.gz (5.7 kB view hashes)

Uploaded Feb 15, 2024 Source

Built Distribution

decisiontree_lite-0.0.1-py3-none-any.whl (6.5 kB view hashes)

Uploaded Feb 15, 2024 Python 3

Hashes for decisiontree_lite-0.0.1.tar.gz

Hashes for decisiontree_lite-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`0fb77e5e91dc1591ffb5faeb6a8bd3d2b127c7adf94c130cf1ffb4aec0ec0ae2`
MD5	`24ce2ce580c86e062f0dd27d74e42edd`
BLAKE2b-256	`9b9f968829b50d9253727c857d6a160b636b18f3ee9d5b2d2402b869d852af2f`

Hashes for decisiontree_lite-0.0.1-py3-none-any.whl

Hashes for decisiontree_lite-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`092d731d6610bdb8d7b7f0da19fbd9e01f628e1f8cf64bd48ae082ac6082574e`
MD5	`1c980ad8c60f5b38b1d790d1b85997d7`
BLAKE2b-256	`fffbbe8e6473ff9ee7236bbe5f1270eedc83824815f745016e46849d1666d140`