XDT (Exact Decision Tree) - High-performance decision tree classifier with exact split optimization
Project description
XDT (Exact Decision Tree) 🚀
A high-performance decision tree classifier with exact split optimization, histogram-based candidate selection, and multi-class support. XDT combines the speed of histogram-based methods with the accuracy of exact split computation.
🎯 Key Features
- Exact Split Computation: Guaranteed optimal splits with O(n) bucket optimization for integer-like features
- Histogram-Based Candidate Selection: Fast feature evaluation using quantile-based binning
- Multi-Class Support: Native support for multi-class classification with Gini impurity
- Numba Acceleration: JIT-compiled prediction with optional parallel processing
- Memory Efficient: BFS tree layout for optimal cache performance
- Adaptive Algorithms: Automatically chooses between O(n) bucket sort and O(n log n) sorting based on data characteristics
🚀 Performance
XDT delivers superior performance through:
- Exact splits: No approximation errors from histogram binning
- Smart candidate selection: Variance bounds to focus on promising features
- Optimized algorithms: O(n) bucket sort for integer-like features, O(n log n) for continuous
- Parallel prediction: Multi-threaded inference for large datasets
- Memory locality: BFS tree traversal for better cache performance
📦 Installation
pip install xdtclassifier
🔧 Quick Start
Basic Usage
from xdt import XDTClassifier
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Create and train classifier
clf = XDTClassifier()
clf.fit(X_train, y_train)
# Make predictions
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
print(f"Accuracy: {(predictions == y_test).mean():.3f}")
Advanced Configuration
# Binary classification optimized settings
binary_clf = XDTClassifier(
max_depth=10,
min_samples_split=20,
n_bins=192,
min_gain_threshold=1e-8,
max_exact_refinements_binary=96,
use_parallel_prediction=True,
parallel_threshold=256
)
# Multi-class optimized settings
multiclass_clf = XDTClassifier(
max_depth=14,
min_samples_split=10,
n_bins=256,
min_gain_threshold=1e-9,
max_exact_refinements=16,
use_parallel_prediction=True,
parallel_threshold=1000
)
Algorithm Statistics
# View detailed algorithm statistics
clf.print_algorithm_stats()
Output:
🚀 XDT MULTI-CLASS STATISTICS (Total Splits: 127)
Classes: 3 ([0 1 2])
================================================================================
✅ Histogram-Based Splits: 89 (70.1%)
✅ Exact refinements evaluated: 234 (184.3%)
🚀 Exact Splits: 38 (29.9%)
✅ Avg Candidates/Split: 12.4
🚀 OPTIMIZATION METHOD BREAKDOWN:
- O(n) Bucket Sort: 28 (73.7%) - Integer-like features
- O(n log n) Sorting: 10 (26.3%) - Continuous features
🏆 ALGORITHM COMPONENTS:
✅ Quantile-based binning (XDT core)
✅ Histogram-based candidate selection (XDT core)
🚀 Exact split computation
✅ Multi-class variance bounds
✅ XDT
⚡ PERFORMANCE: 73.7% O(n) optimization - Excellent for integer-like data
🔧 Parameters
Main Parameters
| Parameter | Default | Description |
|---|---|---|
max_depth |
10 | Maximum tree depth |
min_samples_split |
20 | Minimum samples required to split |
min_samples_leaf |
1 | Minimum samples required in a leaf |
n_bins |
192 | Number of histogram bins for feature discretization |
min_gain_threshold |
1e-8 | Minimum information gain required for splitting |
random_state |
42 | Random seed for reproducibility |
Performance Parameters
| Parameter | Default | Description |
|---|---|---|
use_parallel_prediction |
True | Enable parallel prediction |
parallel_threshold |
256 | Minimum samples for parallel prediction |
max_exact_refinements |
16 | Max exact refinements per node (multi-class) |
max_exact_refinements_binary |
96 | Max exact refinements per node (binary) |
map_labels |
True | Map internal IDs back to original labels |
🏗️ Algorithm Details
Two-Phase Split Selection
- Phase 1 - Histogram Evaluation: Fast candidate screening using quantile-based bins
- Phase 2 - Exact Refinement: Exact split computation on top candidates
Optimization Strategies
- Integer-like Features: O(n) bucket sort for discrete values
- Continuous Features: O(n log n) sorting for exact splits
- Variance Bounds: Early pruning of unpromising features
- Adaptive Thresholds: Binary vs multi-class specific tuning
Memory Layout
- BFS Tree Storage: Better cache locality during prediction
- Contiguous Arrays: Optimized for Numba JIT compilation
- Buffer Reuse: Minimize memory allocations during training
📊 Benchmarks
XDT consistently outperforms standard decision trees:
- Accuracy: Higher due to exact split computation
- Speed: Competitive training, faster prediction via Numba
- Memory: Efficient BFS layout reduces cache misses
- Scalability: Parallel prediction for large datasets
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
🔗 Links
- PyPI: https://pypi.org/project/xdtclassifier/
- GitHub: https://github.com/nqmn/xdt
- Documentation: https://github.com/nqmn/xdt#readme
- Issues: https://github.com/nqmn/xdt/issues
📈 Citation
If you use XDT in your research, please cite:
@software{xdt_classifier,
title={XDT: Exact Decision Tree Classifier},
author={mohdadil},
url={https://github.com/nqmn/xdt},
version={1.0.0},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xdtclassifier-1.0.1.tar.gz.
File metadata
- Download URL: xdtclassifier-1.0.1.tar.gz
- Upload date:
- Size: 24.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74b878fe20ab069ab18c530114e0e3547be37524e700e91e6dcee8d325080ee1
|
|
| MD5 |
0e150eaf4a6f6a882ce3b953ab841ad9
|
|
| BLAKE2b-256 |
2d0004894377d2b4a7c3235481a3d036ef5540e26c1b540915128425f081618d
|
File details
Details for the file xdtclassifier-1.0.1-py3-none-any.whl.
File metadata
- Download URL: xdtclassifier-1.0.1-py3-none-any.whl
- Upload date:
- Size: 19.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
301800f0d4fdbfcd14b8e538cef58e0ffcceec3a9d318a9c80f32a1fbbce39e1
|
|
| MD5 |
3ac487b76bbe81e828fde85de95f3c23
|
|
| BLAKE2b-256 |
73b97f36ddf0f871cb54c0cb9ba3df5af8cd82916fdd52165f2b4540063b279e
|