OC1 Oblique Decision Tree - Implementation of Murthy et al. (AAAI-1992)
Project description
OC1: Oblique Classifier 1
A Python implementation of the OC1 oblique decision tree algorithm as described in:
"OC1: A randomized algorithm for building oblique decision trees"
Sreerama K. Murthy, Simon Kasif, Steven Salzberg, and Richard Beigel
AAAI-1992
Overview
OC1 (Oblique Classifier 1) builds decision trees using oblique hyperplanes (linear combinations of attributes) rather than axis-parallel splits. This often results in smaller and more accurate trees than traditional methods like C4.5 or CART, especially for datasets with diagonal or complex decision boundaries.
Key Features
- Oblique Splits: Uses hyperplanes of the form
∑(aᵢxᵢ) + a_{d+1} = 0 - Hill-Climbing Optimization: Sequential coefficient perturbation with local search
- Randomization: Multiple random restarts to escape local minima
- Impurity Measures: Sum Minority (SM) and Max Minority (MM)
- Pruning: Impurity-based and Reduced Error Pruning (REP)
- Evaluation Tools: Cross-validation, confusion matrix, classification report
- Visualization: Decision boundary and hyperplane plotting (requires matplotlib)
- Export Methods: JSON, dictionary, and DOT format for tree visualization
Team Members
| GitHub Username | Name |
|---|---|
| HxRJILI | RJILI Houssam |
| Kim8x-srscb | Fatima-Ezzahrae AKEBLI |
| Yasseriads | Yasser |
Installation
Prerequisites
- Python 3.8 or higher
- NumPy
Install from Source
# Clone the repository
git clone https://github.com/HxRJILI/Oblique-Classifier-1.git
cd Oblique-Classifier-1
# Create virtual environment (recommended)
python -m venv .venv
.venv\Scripts\activate # Windows
# source .venv/bin/activate # Linux/Mac
# Install in development mode
pip install -e .
# Or install dependencies directly
pip install numpy
Install Development Dependencies
pip install -r requirements-dev.txt
Quick Start
import numpy as np
from oc1 import ObliqueDecisionTree
# Create sample data
X = np.array([
[0, 0], [0.2, 0.1], [0.1, 0.2], # Class 0
[1, 1], [0.8, 0.9], [0.9, 0.8], # Class 1
])
y = np.array([0, 0, 0, 1, 1, 1])
# Train oblique decision tree
tree = ObliqueDecisionTree(
max_depth=5,
impurity_measure="sm", # Sum Minority
n_restarts=5, # Random restarts
random_state=42,
)
tree.fit(X, y)
# Make predictions
predictions = tree.predict(X)
print(f"Predictions: {predictions}")
# Get accuracy
accuracy = tree.score(X, y)
print(f"Accuracy: {accuracy:.2%}")
# View tree structure
print(tree.print_tree())
API Reference
ObliqueDecisionTree
from oc1 import ObliqueDecisionTree
tree = ObliqueDecisionTree(
max_depth=None, # Maximum tree depth (None = unlimited)
min_samples_leaf=1, # Minimum samples per leaf
min_samples_split=2, # Minimum samples to split
impurity_measure="sm", # "sm" (Sum Minority) or "mm" (Max Minority)
max_iterations=100, # Hill-climbing iterations per node
n_restarts=5, # Random restarts (1 = deterministic)
random_state=None, # Random seed for reproducibility
impurity_threshold=0.0, # Stop splitting threshold
verbose=False, # Enable verbose logging
log_file=None, # Optional log file path
)
Methods
| Method | Description |
|---|---|
fit(X, y) |
Train the decision tree |
predict(X) |
Predict class labels |
predict_proba(X) |
Predict class probabilities |
score(X, y) |
Calculate classification accuracy |
get_depth() |
Get maximum tree depth |
get_n_leaves() |
Count number of leaf nodes |
get_hyperplanes() |
Get all splitting hyperplanes |
print_tree() |
Get string representation of tree |
prune(method, ...) |
Prune the tree |
to_dict() |
Export tree as dictionary |
to_json() |
Export tree as JSON string |
to_dot() |
Export tree in DOT format |
feature_importances_ |
Get feature importance scores |
Synthetic Datasets
from oc1.data import (
make_diagonal_dataset, # 45° diagonal boundary
make_xor_dataset, # XOR problem
make_oblique_classification, # Custom angle boundary
make_multiclass_oblique, # Multi-class sectors
make_3d_oblique, # 3D oblique plane
)
# Example
X, y = make_diagonal_dataset(n_samples=100, random_state=42)
Evaluation Tools
from oc1.evaluation import (
train_test_split, # Split data into train/test sets
cross_validate, # K-fold cross-validation
confusion_matrix, # Classification confusion matrix
classification_report, # Precision, recall, F1-score
)
Visualization (requires matplotlib)
from oc1.visualization import (
plot_decision_boundary_2d, # Plot 2D decision regions
plot_hyperplanes_2d, # Plot all hyperplanes
)
Usage Examples
Pruning
from oc1 import ObliqueDecisionTree
from oc1.evaluation import train_test_split
from oc1.data import make_diagonal_dataset
# Generate data
X, y = make_diagonal_dataset(n_samples=200, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2)
# Build tree
tree = ObliqueDecisionTree(max_depth=10, random_state=42)
tree.fit(X_train, y_train)
# Prune using Reduced Error Pruning
tree.prune(method="rep", X_val=X_val, y_val=y_val)
# Or prune by impurity threshold
tree.prune(method="impurity", impurity_threshold=2.0)
Cross-Validation
from oc1 import ObliqueDecisionTree
from oc1.evaluation import cross_validate
from oc1.data import make_diagonal_dataset
X, y = make_diagonal_dataset(n_samples=200, random_state=42)
tree = ObliqueDecisionTree(max_depth=5, random_state=42)
# 5-fold cross-validation
results = cross_validate(tree, X, y, cv=5, random_state=42)
print(f"Mean accuracy: {results['test_score'].mean():.3f}")
print(f"Std accuracy: {results['test_score'].std():.3f}")
Logging
from oc1 import ObliqueDecisionTree
from oc1.data import make_diagonal_dataset
X, y = make_diagonal_dataset(n_samples=100, random_state=42)
# Enable verbose logging
tree = ObliqueDecisionTree(max_depth=5, verbose=True, random_state=42)
tree.fit(X, y)
# Or log to file
tree = ObliqueDecisionTree(
max_depth=5,
log_file="tree_construction.log",
random_state=42
)
tree.fit(X, y)
# Get log summary
summary = tree.logger.get_log_summary()
print(f"Nodes created: {summary['nodes_created']}")
Visualization
from oc1 import ObliqueDecisionTree
from oc1.visualization import plot_decision_boundary_2d, plot_hyperplanes_2d
from oc1.data import make_diagonal_dataset
import matplotlib.pyplot as plt
X, y = make_diagonal_dataset(n_samples=100, random_state=42)
tree = ObliqueDecisionTree(max_depth=5, random_state=42)
tree.fit(X, y)
# Plot decision boundary
plot_decision_boundary_2d(tree, X, y)
plt.show()
# Plot hyperplanes
plot_hyperplanes_2d(tree, X)
plt.show()
Export Tree
from oc1 import ObliqueDecisionTree
import json
tree = ObliqueDecisionTree(max_depth=3, random_state=42)
tree.fit(X, y)
# Export as dictionary
tree_dict = tree.to_dict()
# Export as JSON
tree_json = tree.to_json(indent=2)
with open("tree.json", "w") as f:
f.write(tree_json)
# Export as DOT (for Graphviz)
dot_string = tree.to_dot()
with open("tree.dot", "w") as f:
f.write(dot_string)
Project Structure
Oblique-Classifier-1/
├── oc1/
│ ├── __init__.py # Package exports
│ ├── core/
│ │ ├── node.py # ObliqueTreeNode class
│ │ ├── tree.py # ObliqueDecisionTree classifier
│ │ ├── splits.py # Impurity measures, partitioning
│ │ ├── hill_climb.py # Coefficient perturbation
│ │ └── logging.py # TreeConstructionLogger
│ ├── data/
│ │ └── datasets.py # Synthetic test datasets
│ ├── evaluation/
│ │ └── __init__.py # Evaluation tools
│ ├── visualization/
│ │ └── __init__.py # Plotting utilities
│ └── tests/
│ ├── task1_tests/ # Core tree construction tests
│ ├── task2_tests/ # Randomization tests
│ └── task3_tests/ # Pruning & evaluation tests
├── examples/
│ ├── task2_demo.py # Task 2 demonstration
│ └── task3_demo.py # Task 3 demonstration
├── requirements.txt # Runtime dependencies
├── requirements-dev.txt # Development dependencies
├── setup.py # Package installation
└── TESTING_GUIDE.md # Testing documentation
Running Tests
# Run all tests (236 tests)
pytest oc1/tests/ -v
# Run specific task tests
pytest oc1/tests/task1_tests/ -v # Core tree construction
pytest oc1/tests/task2_tests/ -v # Randomization
pytest oc1/tests/task3_tests/ -v # Pruning & evaluation
# Run with coverage report
pytest oc1/tests/ --cov=oc1 --cov-report=html
# Run specific test file
pytest oc1/tests/task1_tests/test_tree.py -v
Paper Fidelity
This implementation follows the original OC1 paper exactly:
Task 1: Core Tree Construction
| Feature | Paper Section | Implementation |
|---|---|---|
Hyperplane equation ∑(aᵢxᵢ) + a_{d+1} = 0 |
Section 2 | evaluate_hyperplane() |
| Partition rule (V > 0 → left) | Section 2 | partition_data() |
| Sum Minority impurity | Section 2.4 | calculate_impurity() |
| Max Minority impurity | Section 2.4 | calculate_impurity() |
Perturbation formula (Eq. 1): Uⱼ = aₘxⱼᵐ - Vⱼ/xⱼᵐ |
Section 2.2 | compute_u_values() |
| Sequential hill-climbing | Section 2.1 | hill_climb() |
Task 2: Randomization
| Feature | Paper Section | Implementation |
|---|---|---|
| Random hyperplane initialization | Section 2.3 | initialize_hyperplane(method="random") |
| K random trials | Section 2.3 | n_restarts parameter |
| Multi-coefficient perturbation | Section 2 | perturb_multiple_coefficients() |
| Random perturbation order | Section 2 | use_random_perturbation_order |
Task 3: Pruning & Evaluation
| Feature | Paper Section | Implementation |
|---|---|---|
| Impurity-based pruning | Section 2.4 | prune(method="impurity") |
| Reduced Error Pruning | Section 2.4 | prune(method="rep") |
| Stopping criteria | Section 2.4 | max_depth, min_samples_leaf, impurity_threshold |
Why Oblique Trees?
Axis-parallel decision trees (like CART or C4.5) can only create splits perpendicular to feature axes. For datasets with diagonal decision boundaries, this requires many splits to approximate the true boundary.
Oblique trees use hyperplanes that can be oriented at any angle, often resulting in:
- ✅ Smaller trees (fewer nodes)
- ✅ Better accuracy on oblique data
- ✅ More interpretable decision boundaries
Axis-Parallel Tree (many splits) Oblique Tree (one split)
| \
-----+----- \
| \
-----+----- vs \
| \
-----+----- \
References
@inproceedings{murthy1992oc1,
title={OC1: A randomized algorithm for building oblique decision trees},
author={Murthy, Sreerama K and Kasif, Simon and Salzberg, Steven and Beigel, Richard},
booktitle={Proceedings of the National Conference on Artificial Intelligence (AAAI)},
pages={322--327},
year={1992}
}
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file oblique_classifier_1-0.3.0.tar.gz.
File metadata
- Download URL: oblique_classifier_1-0.3.0.tar.gz
- Upload date:
- Size: 38.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
06f4d5bd793b5442ea9f89b4bdb6dc02a9192dbcdee54d93bc0bf0d6eeea425b
|
|
| MD5 |
963daf390b04b32dcdf8f3f9201dcb4b
|
|
| BLAKE2b-256 |
54131eb254f9626fca19b99ebe10ee446854277e4fd52975f3c1b2d8a838f657
|
File details
Details for the file oblique_classifier_1-0.3.0-py3-none-any.whl.
File metadata
- Download URL: oblique_classifier_1-0.3.0-py3-none-any.whl
- Upload date:
- Size: 41.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
445231a61597807a53905598de6f8bf5e9f63390df53be95d146f3475edf842c
|
|
| MD5 |
600670080c968b4fe401681e1bb4adb8
|
|
| BLAKE2b-256 |
6074622955b7d821afa9fd4368cda4a37ac02d74f27c3df12888c11d03b5b94e
|