SheShe: Smart High-dimensional Edge Segmentation & Hyperboundary Explorer

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

SheShe

Smart High-dimensional Edge Segmentation & Hyperboundary Explorer

Edge segmentation and hyperboundary exploration based on local maxima of the class probability (classification) or the predicted value (regression).

Installation

Requires Python >=3.9 and it is recommended to work inside a virtual environment. Install the latest release from PyPI:

pip install sheshe

Base dependencies: numpy, pandas, scikit-learn>=1.1, matplotlib

For a development environment with tests:

pip install -e ".[dev]"
PYTHONPATH=src pytest -q

Quick API

from sheshe import ModalBoundaryClustering

# classification
clf = ModalBoundaryClustering(
    base_estimator=None,           # default LogisticRegression
    task="classification",         # "classification" | "regression"
    base_2d_rays=24,
    direction="center_out",        # "center_out" | "outside_in"
    scan_radius_factor=3.0,
    scan_steps=24,
    random_state=0
)

# regression (example)
reg = ModalBoundaryClustering(task="regression")

Methods

fit(X, y)
predict(X)
predict_proba(X) → classification: per-class probabilities; regression: normalized value [0,1]
interpretability_summary(feature_names=None) → DataFrame with:
- Type: "centroid" | "inflection_point"
- Distance: radius from the center to the inflection point
- Category: class (or "NA" in regression)
- slope: df/dt at the inflection point
- real_value / norm_value
- coord_0..coord_{d-1} or feature names
plot_pairs(X, y=None, max_pairs=None) → 2D plots for all pair combinations
save(filepath) → save the model using joblib
ModalBoundaryClustering.load(filepath) → load a saved instance

How does it work?

Train/use a base model from sklearn (classification with predict_proba or regression with predict).
Find local maxima via gradient ascent with barriers at the domain boundaries.
From the maximum, trace rays (directions) on the hypersphere:
- 2D: 8 rays by default
- 3D: ~26 directions (coverage by spherical caps using Fibonacci sampling)
- 3D: mixture of a few global directions + 2D/3D subspaces
Along each ray, scan radially and compute the first inflection point according to direction:
- center_out: from the center outward
- outside_in: from the outside toward the center Also record the slope (df/dt) at that point.
Connect the inflection points to form the boundary of the region with high probability/value.

Examples

Classification — Iris

import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.linear_model import LogisticRegression
from sheshe import ModalBoundaryClustering

iris = load_iris()
X, y = iris.data, iris.target

sh = ModalBoundaryClustering(
    base_estimator=LogisticRegression(max_iter=1000),
    task="classification",
    base_2d_rays=8,
    random_state=0,
).fit(X, y)

print(sh.interpretability_summary(iris.feature_names).head())
sh.plot_pairs(X, y, max_pairs=3)   # generate the plots
plt.show()

Classification with pre-trained model

import matplotlib.pyplot as plt
from sklearn.datasets import load_wine
from sklearn.ensemble import RandomForestClassifier
from sheshe import ModalBoundaryClustering

wine = load_wine()
X, y = wine.data, wine.target

# Train a model independently
base_model = RandomForestClassifier(n_estimators=200, random_state=0)
base_model.fit(X, y)

# Use SheShe with that pre-fitted model
sh = ModalBoundaryClustering(
    base_estimator=base_model,
    task="classification",
    base_2d_rays=8,
    random_state=0,
).fit(X, y)

sh.plot_pairs(X, y, max_pairs=2)
plt.show()

Regression — Diabetes

import matplotlib.pyplot as plt
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sheshe import ModalBoundaryClustering

diab = load_diabetes()
X, y = diab.data, diab.target

sh = ModalBoundaryClustering(
    base_estimator=GradientBoostingRegressor(random_state=0),
    task="regression",
    base_2d_rays=8,
    random_state=0,
).fit(X, y)

print(sh.interpretability_summary(diab.feature_names).head())
sh.plot_pairs(X, max_pairs=3)
plt.show()

Saving figures

from pathlib import Path
import matplotlib.pyplot as plt

# after calling ``sh.plot_pairs(...)``
out_dir = Path("images")
out_dir.mkdir(exist_ok=True)
for i, fig_num in enumerate(plt.get_fignums()):
    plt.figure(fig_num)
    plt.savefig(out_dir / f"pair_{i}.png")
plt.close(fig_num)

Plotting with pandas DataFrames

import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)

sh = ModalBoundaryClustering().fit(df, iris.target)
sh.plot_pairs(df, iris.target, max_pairs=2)  # usa nombres de columnas en los ejes
plt.show()

Visualizing interpretability summary

import matplotlib.pyplot as plt

summary = sh.interpretability_summary(df.columns)
centroids = summary[summary["Type"] == "centroid"]
plt.scatter(centroids["coord_0"], centroids["coord_1"], c=centroids["Category"])
plt.xlabel("coord_0")
plt.ylabel("coord_1")
plt.show()

Save and load model

from pathlib import Path
from sklearn.datasets import load_iris
from sheshe import ModalBoundaryClustering

iris = load_iris()
X, y = iris.data, iris.target

sh = ModalBoundaryClustering().fit(X, y)
path = Path("sheshe_model.joblib")
sh.save(path)
sh2 = ModalBoundaryClustering.load(path)
print((sh.predict(X) == sh2.predict(X)).all())

For more complete examples, see the examples/ folder.

SubspaceScout

SubspaceScout helps discover informative feature subspaces (pairs, trios, ...) before running SheShe. It can work purely with mutual information or leverage optional models like LightGBM+SHAP or EBM to rank feature interactions.

from sheshe import SubspaceScout

scout = SubspaceScout(
    # model_method='lightgbm',    # default uses MI; LightGBM and SHAP are optional
    max_order=4,                # explore pairs, trios and quartets
    top_m=50,                   # limit to top 50 informative features
    base_pairs_limit=12,        # seed pairs for orders >=3
    beam_width=10,              # combos kept per layer
    extend_candidate_pool=16,   # random candidate features per parent
    branch_per_parent=4,        # extensions per parent
    marginal_gain_min=1e-3,     # minimum gain to accept
    max_eval_per_order=150,     # cap MI evaluations per order
    sample_size=4096,           # subsample size
    time_budget_s=None,         # e.g., 15.0 for 15 seconds
    task='classification',
    random_state=0,
)
subspaces = scout.fit(X, y)

Experiments and benchmark

The experiments comparing against unsupervised algorithms are located in the experiments/ folder. The script compare_unsupervised.py evaluates five different datasets, explores parameters of SheShe, KMeans and DBSCAN, and stores four metrics (ARI, homogeneity, completeness, v_measure) along with the execution time (runtime_sec).

python experiments/compare_unsupervised.py --runs 5
cat benchmark/unsupervised_results_summary.csv | head

Results are generated inside benchmark/ (valores por repetición y medias en *_summary.csv).

For the manuscript we provide additional scripts in paper_experiments.py which perform supervised comparisons, ablation studies over base_2d_rays and direction, and sensitivity analyses w.r.t. dimensionality and Gaussian noise. Executing the script generates tables with todas las repeticiones y un resumen (*_summary.csv), además de figuras (*.png) bajo benchmark/:

python experiments/paper_experiments.py --runs 5

Key parameters

base_2d_rays → controls angular resolution in 2D (24 by default). 3D scales to ~26; d>3 uses subspaces.
direction → "center_out" | "outside_in" to locate the inflection point.
scan_radius_factor, scan_steps → size and resolution of the radial scan.
grad_* → hyperparameters of gradient ascent (rate, iterations, tolerances).
max_subspaces → max number of subspaces considered when d>3.
density_alpha / density_k → optional density penalty computed with an HNSW k‑NN search (via hnswlib) to keep centers inside the data cloud. The normalized value is multiplied by (density(x))**density_alpha; set density_alpha=0 to disable.

Performance tips

Defaults favour speed: base_2d_rays=24, scan_steps=24 and n_max_seeds=2.
The heuristic auto_rays_by_dim=True (default) reduces rays for high dimensional datasets:
- 25–64 features → base_2d_rays capped at 16.
- 65+ features → base_2d_rays capped at 12. For 30D problems such as Breast Cancer this matches the recommended base_2d_rays=16.

Limitations

Depends on the surface produced by the base model (can be rough in RF).
In high dimension, the boundary is an approximation (subspaces).
Finds local maxima (does not guarantee the global one), mitigated with multiple seeds.

Contribute

Improvements are welcome. To propose changes:

Fork the repository and create a descriptive branch.

Install development dependencies and run the tests:

pip install -e ".[dev]"
PYTHONPATH=src pytest -q

Submit a pull request with a clear description of the change.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 3 - Alpha
Intended Audience
- Science/Research
License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.1.14

Sep 6, 2025

0.1.13

Sep 5, 2025

0.1.12

Sep 5, 2025

0.1.11

Sep 4, 2025

0.1.10

Sep 2, 2025

0.1.9

Aug 31, 2025

0.1.8

Aug 30, 2025

0.1.7

Aug 29, 2025

0.1.6

Aug 26, 2025

0.1.5

Aug 26, 2025

0.1.4

Aug 19, 2025

0.1.3

Aug 17, 2025

This version

0.1.2

Aug 15, 2025

0.1.1

Aug 15, 2025

0.1.0

Aug 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sheshe-0.1.2.tar.gz (34.4 kB view details)

Uploaded Aug 15, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sheshe-0.1.2-py3-none-any.whl (26.6 kB view details)

Uploaded Aug 15, 2025 Python 3

File details

Details for the file sheshe-0.1.2.tar.gz.

File metadata

Download URL: sheshe-0.1.2.tar.gz
Upload date: Aug 15, 2025
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for sheshe-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`acf5b97276d7ba44a1a07caaeb9d7e18c25010b8f55710458688c3de325ff4ec`
MD5	`fd6fc66a9cf6637f0a209cb420b60461`
BLAKE2b-256	`6e80585936cdbc8145aded6cd6529103720b2bb1e2f739631076200cebf2f77a`

See more details on using hashes here.

File details

Details for the file sheshe-0.1.2-py3-none-any.whl.

File metadata

Download URL: sheshe-0.1.2-py3-none-any.whl
Upload date: Aug 15, 2025
Size: 26.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for sheshe-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`47c34f7ae1d1f80b71d293a1067ca6704a663fa0e2eb017088b6e565e8c450ce`
MD5	`35886b2272aaff5d125604e94a3b0dbf`
BLAKE2b-256	`ec7b4b342b157e61d2789c23b3b98170fe9c1f7e7b940f5ccfb0a7ef78a7f254`

See more details on using hashes here.

sheshe 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SheShe

Installation

Quick API

Methods

How does it work?

Examples

Classification — Iris

Classification with pre-trained model

Regression — Diabetes

Saving figures

Plotting with pandas DataFrames

Visualizing interpretability summary

Save and load model

SubspaceScout

Experiments and benchmark

Key parameters

Performance tips

Limitations

Contribute

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes