Skip to main content

Decision tree classifier with centroid-asymmetry split criterion

Project description

asymtree

A scikit-learn compatible decision tree classifier that rewards splits where one side's centroid is pulled far from the splitting boundary — centroid asymmetry.

Motivation

Standard decision trees choose splits purely on impurity (Gini / entropy). A split that perfectly separates classes but places both centroids equidistant from the boundary is treated the same as one where nearly all of one class is packed tightly on one side. asymtree exposes this asymmetry as an explicit objective, so you can tune how aggressively the tree favors interpretable, one-sided splits.

Mathematical background

For a candidate split on feature k at threshold t, with left samples
L = {x : x_k ≤ t} and right samples R = {x : x_k > t}, define:

asymmetry(k, t) = max(t − μ_L, μ_R − t) / (x_k_max − x_k_min)

where μ_L and μ_R are the feature-k means on each side. The denominator normalises by the feature range so scores are comparable across features and always lie in [0, 1].

Two combination strategies

Additive — score the split as a weighted sum:

score(k, t) = ΔGini(k, t) + λ · asymmetry(k, t)

Lexicographic — among all splits within ε of the best Gini improvement, pick the one with the highest asymmetry. Purity and asymmetry are fully decoupled.

Efficient implementation

μ_L and μ_R are maintained as running sums while the threshold scan moves left to right, giving the same O(n) cost per feature as the standard split search — no extra pass over the data.

Installation

pip install asymtree

Requirements: Python ≥ 3.9, scikit-learn ≥ 1.4, numpy ≥ 1.21.
A C compiler and Cython ≥ 3.0 are needed to build from source.

Quick start

from asymtree import AsymmetryDecisionTreeClassifier

# Additive mode: impurity + 0.5 × asymmetry
clf = AsymmetryDecisionTreeClassifier(
    max_depth=4,
    lambda_=0.5,
    lexicographic=False,
    random_state=42,
)
clf.fit(X_train, y_train)
print(clf.score(X_test, y_test))

# Lexicographic mode: purity first, asymmetry breaks ties
clf_lexico = AsymmetryDecisionTreeClassifier(
    max_depth=4,
    eps_impurity=1e-3,
    lexicographic=True,
    random_state=42,
)
clf_lexico.fit(X_train, y_train)

Parameters

Parameter Type Default Description
lambda_ float 1.0 Weight of the asymmetry term (additive mode only).
eps_impurity float 1e-4 Tolerance band for lexicographic tiebreaking.
lexicographic bool False If True, use lexicographic mode.
max_depth int | None None Maximum tree depth.
min_samples_split int 2 Minimum samples to split a node.
min_samples_leaf int 1 Minimum samples in a leaf.
max_features int | float | str | None None Number of features to consider per split.
random_state int | None None Random seed.

All other DecisionTreeClassifier parameters are forwarded unchanged.

Compatibility

AsymmetryDecisionTreeClassifier is a drop-in replacement for sklearn.tree.DecisionTreeClassifier and works with all sklearn utilities: cross-validation, pipelines, clone, GridSearchCV, plot_tree, etc.

from sklearn.model_selection import GridSearchCV

param_grid = {"max_depth": [3, 4, 5], "lambda_": [0.1, 0.5, 1.0]}
gs = GridSearchCV(AsymmetryDecisionTreeClassifier(), param_grid, cv=5)
gs.fit(X_train, y_train)

Running the tests

pip install pytest
pytest tests/

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asymtree-0.1.0.tar.gz (184.9 kB view details)

Uploaded Source

File details

Details for the file asymtree-0.1.0.tar.gz.

File metadata

  • Download URL: asymtree-0.1.0.tar.gz
  • Upload date:
  • Size: 184.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for asymtree-0.1.0.tar.gz
Algorithm Hash digest
SHA256 669c52fcb546048ba1eb393786b2707ac2c17e6af3776131196099f3d4695e84
MD5 7baa73d866fe409644bfdda457df5e59
BLAKE2b-256 d1c09341c7ddf7f10e4c5f7614078d7280aebff73fbe071a789ae35ec969d0ad

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page