Skip to main content

Fit, score, and embed nested-dichotomy (ND) trees for multi-class classification.

Project description

ndscape

Fit, score, and embed nested-dichotomy (ND) trees for multi-class classification.

A nested dichotomy reduces a C-class problem to a tree of binary splits (e.g. {0,1,2} vs {3,4}, then {0} vs {1,2}, ...). ndscape lets you fit one, score it, or place a whole population of candidate trees in a 2-D "tree-space" to see how a property (accuracy, variance, ...) varies across tree structures.

Install

pip install ndscape
pip install ndscape[spatial]   # adds Moran's I support (esda, libpysal)
pip install ndscape[plot]      # adds plotting (matplotlib, bokeh)

Quickstart

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import ndscape as nds

X, y = load_iris(return_X_y=True)
classes = sorted(set(y))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

nd = nds.fit(X_train, y_train, classes=classes, base="lr")
nd.predict(X_test)
nd.score(X_test, y_test)   # {"accuracy": ..., "logloss": ...}

classes is the list of class labels in your y (sorted(set(y)) works for integer or string labels). nds.fit samples one ND tree automatically; pass tree=... to use a specific one (see "A tree is..." below).

Use cases

You have a dataset and a binary classifier.

base can be the string "lr" or "decisiontree", or your own unfitted scikit-learn estimator — a fresh clone of it is fit at every split.

from sklearn.svm import SVC

nd = nds.fit(X_train, y_train, classes=classes, base=SVC(probability=True, kernel="linear"))

You have a train/test split and want a score.

nd = nds.ND(tree, classes).fit(X_train, y_train, base="lr")
nd.score(X_test, y_test)   # {"accuracy": ..., "logloss": ...}

You already trained the per-split models yourself.

# models in the same order as tree, or a {(left, right): model} dict — either works
nd = nds.ND.from_trained(tree, classes, models=[fitted_model_1, fitted_model_2, ...])
nd.predict_proba(X_test)

You already scored a set of trees and want to see where they sit in tree-space.

trees, coords = nds.embed_trees(classes)
nds.spatial_autocorrelation(my_scores, coords)   # {"I": ..., "p_sim": ...}

You just want the whole picture: fit, score, and embed every candidate tree.

rows = nds.analyze(X_train, y_train, classes, X_test=X_test, y_test=y_test, base="lr")
# [{"tree": ..., "accuracy": ..., "logloss": ..., "coord": array([...])}, ...]

You want a picture of that tree-space.

nds.plot(rows, metric="accuracy", path="tree_space.png")        # static PNG/PDF
nds.plot_interactive(rows, metric="accuracy", path="tree_space.html")  # pan/zoom/hover

Both color points by metric and mark the best tree with a black x. Needs ndscape[plot].

The embedding is slow to recompute and you want to reuse it.

rows = nds.analyze(X_train, y_train, classes, cache="embedding.joblib")

The MDS step in embed_trees/analyze is the slow part. Pass cache= a .joblib path: the first call computes the embedding and saves it there, later calls with the same path just load it.

A tree is a list of (left, right) tuples of class labels, e.g. [((0, 1), (2, 3)), ((0,), (1,)), ((2,), (3,))]. Use nds.all_trees(classes) (exhaustive, for small C) or nds.sample_trees(classes, N) (for larger C) to generate candidates.

base accepts "lr", "decisiontree", or any unfitted scikit-learn estimator with fit/predict_proba.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndscape-0.1.1.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndscape-0.1.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file ndscape-0.1.1.tar.gz.

File metadata

  • Download URL: ndscape-0.1.1.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for ndscape-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0019397adad7666233bddaa2b8e3a4923b136d2c97ef816417a30bc3c0cd6cc4
MD5 2a20c5c148ab6509c6a05705dc22d4ac
BLAKE2b-256 82811e703871a72f228c58d7b45893f35c0fe419c6f3b060233b11f26d4c9e5c

See more details on using hashes here.

File details

Details for the file ndscape-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ndscape-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for ndscape-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f1c956b055293d05cd13a2ead2fad696ceedb3b871ab8371997ec3652f5382d0
MD5 398ff76331e6afb1b1a6d179eda7e7d5
BLAKE2b-256 f42b8ec0be710253d5aa62a16b4d1c6b5d17cb82e6f29dcd1392951927f6e3e7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page