Shapley value transformations for behavioral data analysis

These details have not been verified by PyPI

Project links

Project description

shapley_behaviors

Shapley value transformations for explainable behavioral data analysis.

Traditional clustering asks which samples are similar? but not why do they cluster together? Shapley behavioral transformations answer the "why" by decomposing statistical properties — variance, skewness, kurtosis, entropy — into individual sample contributions. Samples that cluster in behavioral space share the same statistical role in the dataset, providing mechanistic and actionable insights.

Implementation of the methodology from:

Liu, T., and Barnard, A. S. (2025). Understanding interpretable patterns of Shapley behaviours in materials data. Machine Learning: Engineering, 1, 015004. https://doi.org/10.1088/3049-4761/adaaf6

Features

Decompose datasets into variance, skewness, kurtosis, and entropy behavioral spaces
Parallel computation via joblib for large datasets
Outlier detection directly in behavioral space
Bundled interactive explorer scripts for Jupyter-based analysis with PCA plots, clustering statistics, and region-of-interest annotation
Antithetic sampling for variance reduction in Monte Carlo estimation

Installation

pip install shapley_behaviors

Quick Start

import numpy as np
from shapley_behaviors import ShapleyBehaviors

X = np.random.randn(500, 20)  # (n_samples, n_features)

sb = ShapleyBehaviors(n_permutations=100, n_jobs=-1, random_state=42)

# Transform to a single behavioral space
Phi_variance = sb.transform(X, value_function='variance')

# Or compute all four spaces at once
behavioral_spaces = sb.transform_multiple(X)
# keys: 'variance', 'skewness', 'kurtosis', 'entropy'

Outlier Detection

from shapley_behaviors import identify_outliers

outlier_indices, outlier_scores = identify_outliers(Phi_variance, threshold=2.5)
print(f"Detected {len(outlier_indices)} outliers")

Understanding Behavioral Spaces

Each space answers a different question about the role of each sample in the dataset:

Space	Positive values	Negative values	Use case
Variance	Stretchers — widen the distribution	Stabilizers — typical samples near the mean	Quality control, process instability
Skewness	Pull distribution above the mean	Pull distribution below the mean	Biased synthesis, directional drift
Kurtosis	Tail samples — rare extreme events	Core samples — predictable, well-behaved	Anomaly detection, reliability analysis
Entropy	High-information — rare, unique combinations	Low-information — common, redundant	Dataset curation, diversity quantification

Hopkins Statistic

The Hopkins statistic H measures clustering tendency in behavioral space:

H value	Interpretation
> 0.7	Strong clustering — samples group by behavior
≈ 0.5	Random distribution — no natural grouping
< 0.3	Regular/uniform distribution

Convenience Functions

from shapley_behaviors import (
    compute_shapley_variance,
    compute_shapley_skewness,
    compute_shapley_kurtosis,
    compute_shapley_entropy,
)

Phi = compute_shapley_variance(X, n_permutations=100, n_jobs=-1, random_state=42)

Explorer Scripts

The package bundles two standalone Jupyter-compatible scripts for comprehensive analysis. Copy them to your working directory:

from shapley_behaviors import copy_scripts

copy_scripts(".")                                         # all scripts
copy_scripts("./analysis", scripts=["behavioral_space_explorer"])  # one script

Behavioral Space Explorer

Full dataset exploration — PCA plots, Hopkins statistics, outlier detection:

SEED = 42
N_PERMUTATIONS = 1000      # 100 for quick tests, 1000 for publication
N_JOBS = -1

DATASET_NAME = "mydata"
DATA_FILE = "mydata.csv"
ID_COLUMN = "sample_id"
DROP_COLUMNS = ["col_a", "col_b"]
LABEL_COLUMNS = ["target1", "target2", "category"]
OUTPUT_DIR = "behavioral_exploration"
SELECTED_FEATURES = ["feature1", "feature2"]  # optional highlight

%run -i behavioral_space_explorer.py

Outputs:

File	Contents
`{name}_behavioral_spaces.npy`	All four behavioral transformations
`{name}_behave_{space}_{label}.png`	PCA plots colored by each label
`{name}_hopkins_statistics.csv`	Clustering tendency metrics
`{name}_clustering_statistics.csv`	Variance explained, pairwise distances
`{name}_outliers_{space}.csv`	Outlier samples per space

Behavioral Region Explorer

Targeted analysis of specific PCA regions:

BEHAVIORAL_SPACES_FILE = "behavioral_exploration/mydata_behavioral_spaces.npy"
PLOT_MODE = "combined"  # or "separate"

USER_REGIONS = {
    "high_variance_cluster": {
        "space": "variance",
        "pc1_range": (0.3, 0.6),
        "pc2_range": (-0.2, 0.2),
        "description": "High variance contributors",
        "color": "red",
    },
}

%run -i behavioral_region_explorer.py

Parameters

Parameter	Values	Notes
`n_permutations`	50–100 (explore), 200–500 (standard), 1000+ (publication)	Higher = more accurate, slower
`n_jobs`	`-1` (all cores), `1` (debug), `N` (N cores)	Parallelises over features
`random_state`	any int	Set for reproducibility; uses antithetic sampling

Runtime Estimates

Dataset size	n_permutations	Estimated time
500 samples	100	2–5 min
500 samples	1000	15–30 min
4000 samples	100	20–30 min
4000 samples	1000	2–3 hours

API Reference

# Main class
sb = ShapleyBehaviors(n_permutations=100, n_jobs=-1, random_state=42)
Phi = sb.transform(X, value_function='variance', verbose=True)
spaces = sb.transform_multiple(X, value_functions=['variance', 'skewness', 'kurtosis', 'entropy'])

# Outlier detection
outlier_indices, outlier_scores = identify_outliers(Phi, threshold=3.0, method='zscore')

Troubleshooting

Problem	Solution
`ImportError`	`pip install shapley_behaviors`
Long runtime	Reduce `n_permutations` to 100 for testing
Memory error	Reduce `n_jobs` or process data in batches
High additivity error warning	Increase `n_permutations`
H ≈ 0.5 (no clustering)	Data may lack natural behavioral groupings

Citation

@article{liu2025shapley,
  author  = {Liu, Tommy and Barnard, Amanda S.},
  title   = {Understanding interpretable patterns of {Shapley} behaviours in materials data},
  journal = {Machine Learning: Engineering},
  volume  = {1},
  pages   = {015004},
  year    = {2025},
  doi     = {10.1088/3049-4761/adaaf6}
}

Links

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

May 5, 2026

0.1.1

Feb 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shapley_behaviors-0.1.2.tar.gz (31.0 kB view details)

Uploaded May 5, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shapley_behaviors-0.1.2-py3-none-any.whl (29.3 kB view details)

Uploaded May 5, 2026 Python 3

File details

Details for the file shapley_behaviors-0.1.2.tar.gz.

File metadata

Download URL: shapley_behaviors-0.1.2.tar.gz
Upload date: May 5, 2026
Size: 31.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shapley_behaviors-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`acd1449e2fa7e54502556d4f93f02df799ea163aff47286e1e56a058e8117424`
MD5	`3aa5d7231361a880beb3d4092b028d99`
BLAKE2b-256	`a87ac41d63df1624c6ca9704fdc21897c4866a202e7c180ff40504e415436759`

See more details on using hashes here.

File details

Details for the file shapley_behaviors-0.1.2-py3-none-any.whl.

File metadata

Download URL: shapley_behaviors-0.1.2-py3-none-any.whl
Upload date: May 5, 2026
Size: 29.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shapley_behaviors-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9699d5108c33035879a252a12df6df54ccea5e849c514b2d26bf3ee00a47193a`
MD5	`81545bc061f1a070785e12838778f970`
BLAKE2b-256	`635f5a21a40bcb983026987ffd148f98754bd6d2be61f6e3811a534ce4c16d01`

See more details on using hashes here.

shapley-behaviors 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

shapley_behaviors

Features

Installation

Quick Start

Outlier Detection

Understanding Behavioral Spaces

Hopkins Statistic

Convenience Functions

Explorer Scripts

Behavioral Space Explorer

Behavioral Region Explorer

Parameters

Runtime Estimates

API Reference

Troubleshooting

Citation

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes