Skip to main content

XDT (Exact Decision Tree) - High-performance decision tree classifier with exact split optimization

Project description

XDT (Exact Decision Tree) 🚀

PyPI version Python 3.8+ License: MIT

A high-performance decision tree classifier with exact split optimization, histogram-based candidate selection, and multi-class support. XDT combines the speed of histogram-based methods with the accuracy of exact split computation.

🎯 Key Features

  • Exact Split Computation: Guaranteed optimal splits with O(n) bucket optimization for integer-like features
  • Histogram-Based Candidate Selection: Fast feature evaluation using quantile-based binning
  • Multi-Class Support: Native support for multi-class classification with Gini impurity
  • Numba Acceleration: JIT-compiled prediction with optional parallel processing
  • Memory Efficient: BFS tree layout for optimal cache performance
  • Adaptive Algorithms: Automatically chooses between O(n) bucket sort and O(n log n) sorting based on data characteristics

🚀 Performance

XDT delivers superior performance through:

  • Exact splits: No approximation errors from histogram binning
  • Smart candidate selection: Variance bounds to focus on promising features
  • Optimized algorithms: O(n) bucket sort for integer-like features, O(n log n) for continuous
  • Parallel prediction: Multi-threaded inference for large datasets
  • Memory locality: BFS tree traversal for better cache performance

📦 Installation

pip install xdtclassifier

🔧 Quick Start

Basic Usage

from xdt import XDTClassifier
import numpy as np
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split

# Generate sample data
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Create and train classifier
clf = XDTClassifier()

clf.fit(X_train, y_train)

# Make predictions
predictions = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)

print(f"Accuracy: {(predictions == y_test).mean():.3f}")

Advanced Configuration

# Binary classification optimized settings
binary_clf = XDTClassifier(
    max_depth=10,
    min_samples_split=20,
    n_bins=192,
    min_gain_threshold=1e-8,
    max_exact_refinements_binary=96,
    use_parallel_prediction=True,
    parallel_threshold=256
)

# Multi-class optimized settings  
multiclass_clf = XDTClassifier(
    max_depth=14,
    min_samples_split=10,
    n_bins=256,
    min_gain_threshold=1e-9,
    max_exact_refinements=16,
    use_parallel_prediction=True,
    parallel_threshold=1000
)

Algorithm Statistics

# View detailed algorithm statistics
clf.print_algorithm_stats()

Output:

🚀 XDT MULTI-CLASS STATISTICS (Total Splits: 127)
   Classes: 3 ([0 1 2])
================================================================================
✅ Histogram-Based Splits: 89 (70.1%)
✅ Exact refinements evaluated: 234 (184.3%)
🚀 Exact Splits: 38 (29.9%)
✅ Avg Candidates/Split: 12.4

🚀 OPTIMIZATION METHOD BREAKDOWN:
  - O(n) Bucket Sort: 28 (73.7%) - Integer-like features
  - O(n log n) Sorting: 10 (26.3%) - Continuous features

🏆 ALGORITHM COMPONENTS:
  ✅ Quantile-based binning (XDT core)
  ✅ Histogram-based candidate selection (XDT core)
  🚀 Exact split computation
  ✅ Multi-class variance bounds
  ✅ XDT

⚡ PERFORMANCE: 73.7% O(n) optimization - Excellent for integer-like data

🔧 Parameters

Main Parameters

Parameter Default Description
max_depth 10 Maximum tree depth
min_samples_split 20 Minimum samples required to split
min_samples_leaf 1 Minimum samples required in a leaf
n_bins 192 Number of histogram bins for feature discretization
min_gain_threshold 1e-8 Minimum information gain required for splitting
random_state 42 Random seed for reproducibility

Performance Parameters

Parameter Default Description
use_parallel_prediction True Enable parallel prediction
parallel_threshold 256 Minimum samples for parallel prediction
max_exact_refinements 16 Max exact refinements per node (multi-class)
max_exact_refinements_binary 96 Max exact refinements per node (binary)
map_labels True Map internal IDs back to original labels

🏗️ Algorithm Details

Two-Phase Split Selection

  1. Phase 1 - Histogram Evaluation: Fast candidate screening using quantile-based bins
  2. Phase 2 - Exact Refinement: Exact split computation on top candidates

Optimization Strategies

  • Integer-like Features: O(n) bucket sort for discrete values
  • Continuous Features: O(n log n) sorting for exact splits
  • Variance Bounds: Early pruning of unpromising features
  • Adaptive Thresholds: Binary vs multi-class specific tuning

Memory Layout

  • BFS Tree Storage: Better cache locality during prediction
  • Contiguous Arrays: Optimized for Numba JIT compilation
  • Buffer Reuse: Minimize memory allocations during training

📊 Benchmarks

XDT consistently outperforms standard decision trees:

  • Accuracy: Higher due to exact split computation
  • Speed: Competitive training, faster prediction via Numba
  • Memory: Efficient BFS layout reduces cache misses
  • Scalability: Parallel prediction for large datasets

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Links

📈 Citation

If you use XDT in your research, please cite:

@software{xdt_classifier,
  title={XDT: Exact Decision Tree Classifier},
  author={mohdadil},
  url={https://github.com/nqmn/xdt},
  version={1.0.0},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xdtclassifier-1.0.1.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xdtclassifier-1.0.1-py3-none-any.whl (19.5 kB view details)

Uploaded Python 3

File details

Details for the file xdtclassifier-1.0.1.tar.gz.

File metadata

  • Download URL: xdtclassifier-1.0.1.tar.gz
  • Upload date:
  • Size: 24.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for xdtclassifier-1.0.1.tar.gz
Algorithm Hash digest
SHA256 74b878fe20ab069ab18c530114e0e3547be37524e700e91e6dcee8d325080ee1
MD5 0e150eaf4a6f6a882ce3b953ab841ad9
BLAKE2b-256 2d0004894377d2b4a7c3235481a3d036ef5540e26c1b540915128425f081618d

See more details on using hashes here.

File details

Details for the file xdtclassifier-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: xdtclassifier-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 19.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for xdtclassifier-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 301800f0d4fdbfcd14b8e538cef58e0ffcceec3a9d318a9c80f32a1fbbce39e1
MD5 3ac487b76bbe81e828fde85de95f3c23
BLAKE2b-256 73b97f36ddf0f871cb54c0cb9ba3df5af8cd82916fdd52165f2b4540063b279e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page