Skip to main content

Fit, score, and embed nested-dichotomy (ND) trees for multi-class classification.

Project description

ndscape

Fit, score, and embed nested-dichotomy (ND) trees for multi-class classification.

A nested dichotomy reduces a C-class problem to a tree of binary splits (e.g. {0,1,2} vs {3,4}, then {0} vs {1,2}, ...). ndscape lets you fit one, score it, or place a whole population of candidate trees in a 2-D "tree-space" to see how a property (accuracy, variance, ...) varies across tree structures.

Install

pip install ndscape
pip install ndscape[spatial]   # adds Moran's I support (esda, libpysal)

Quickstart

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
import ndscape as nds

X, y = load_iris(return_X_y=True)
classes = sorted(set(y))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

nd = nds.fit(X_train, y_train, classes=classes, base="lr")
nd.predict(X_test)
nd.score(X_test, y_test)   # {"accuracy": ..., "logloss": ...}

classes is the list of class labels in your y (sorted(set(y)) works for integer or string labels). nds.fit samples one ND tree automatically; pass tree=... to use a specific one (see "A tree is..." below).

Use cases

You have a dataset and a binary classifier.

base can be the string "lr" or "decisiontree", or your own unfitted scikit-learn estimator — a fresh clone of it is fit at every split.

from sklearn.svm import SVC

nd = nds.fit(X_train, y_train, classes=classes, base=SVC(probability=True, kernel="linear"))

You have a train/test split and want a score.

nd = nds.ND(tree, classes).fit(X_train, y_train, base="lr")
nd.score(X_test, y_test)   # {"accuracy": ..., "logloss": ...}

You already trained the per-split models yourself.

# models in the same order as tree, or a {(left, right): model} dict — either works
nd = nds.ND.from_trained(tree, classes, models=[fitted_model_1, fitted_model_2, ...])
nd.predict_proba(X_test)

You already scored a set of trees and want to see where they sit in tree-space.

trees, coords = nds.embed_trees(classes)
nds.spatial_autocorrelation(my_scores, coords)   # {"I": ..., "p_sim": ...}

You just want the whole picture: fit, score, and embed every candidate tree.

rows = nds.analyze(X_train, y_train, classes, X_test=X_test, y_test=y_test, base="lr")
# [{"tree": ..., "accuracy": ..., "logloss": ..., "coord": array([...])}, ...]

A tree is a list of (left, right) tuples of class labels, e.g. [((0, 1), (2, 3)), ((0,), (1,)), ((2,), (3,))]. Use nds.all_trees(classes) (exhaustive, for small C) or nds.sample_trees(classes, N) (for larger C) to generate candidates.

base accepts "lr", "decisiontree", or any unfitted scikit-learn estimator with fit/predict_proba.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndscape-0.1.0.tar.gz (7.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndscape-0.1.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file ndscape-0.1.0.tar.gz.

File metadata

  • Download URL: ndscape-0.1.0.tar.gz
  • Upload date:
  • Size: 7.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for ndscape-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8b04d83d104c39983392bac89c1aa5f88a7c74cc09b1b1a2a21c6eea8924efb2
MD5 b7bb9956642db713c9969b802b84634e
BLAKE2b-256 db5bdc47bfd6d3b17f0a840b341f9ff4d2ae8a273acd1e0fa7b67f319f093ed0

See more details on using hashes here.

File details

Details for the file ndscape-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ndscape-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for ndscape-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 37353c2dfa6d4e6dee24d3444a8f2bb7a11df38681e183c9d91766f9fa7b6b14
MD5 15f781555cd408d0fe0a9e48dde16688
BLAKE2b-256 5b7501736d1ca99e0be566a0dbbe575919b57691212eafa289906826def34993

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page