Shapley value transformations for behavioral data analysis

These details have not been verified by PyPI

Project links

Project description

shapley_behaviors

Shapley value transformations for explainable behavioral data analysis.

Overview

Traditional clustering asks "which samples are similar?" but not "why do they cluster together?"

Shapley behavioral transformations answer the "why" by decomposing statistical properties (variance, skewness, kurtosis, entropy) into individual sample contributions. Samples that cluster in behavioral space share the same statistical role in the dataset, providing mechanistic and actionable insights.

This package implements the methodology from:

Liu, T., and Barnard, A. S. (2025). Understanding interpretable patterns of Shapley behaviours in materials data. Machine Learning: Engineering, 1, 015004. https://doi.org/10.1088/3049-4761/adaaf6

Installation

pip install shapley_behaviors

Quick Start

import numpy as np
from shapley_behaviors import ShapleyBehaviors

# Load your data (n_samples, n_features)
X = np.random.randn(500, 20)

# Transform to behavioral spaces
sb = ShapleyBehaviors(n_permutations=100, n_jobs=-1, random_state=42)

Phi_variance = sb.transform(X, value_function='variance')
Phi_skewness = sb.transform(X, value_function='skewness')
Phi_kurtosis = sb.transform(X, value_function='kurtosis')
Phi_entropy = sb.transform(X, value_function='entropy')

# Or compute all at once
behavioral_spaces = sb.transform_multiple(X)

Outlier Detection

from shapley_behaviors import identify_outliers

outlier_indices, outlier_scores = identify_outliers(Phi_kurtosis, threshold=2.5)
print(f"Detected {len(outlier_indices)} outliers")

Getting the Explorer Scripts

The package includes standalone explorer scripts for comprehensive analysis with visualizations, statistics, and outlier detection. Copy them to your working directory:

from shapley_behaviors import copy_scripts

# Copy all scripts to current directory
copy_scripts(".")

# Or copy to a specific directory
copy_scripts("./analysis")

# Or copy only one script
copy_scripts(".", scripts=["behavioral_space_explorer"])

Behavioral Space Explorer

Configure and run in Jupyter:

# Configuration
SEED = 42
N_PERMUTATIONS = 1000      # 100 for quick tests, 1000 for publication
N_JOBS = -1                # -1 uses all CPU cores

DATASET_NAME = "mydata"
DATA_FILE = "mydata.csv"
ID_COLUMN = "sample_id"
DROP_COLUMNS = ["col_a", "col_b"]
LABEL_COLUMNS = ["target1", "target2", "category"]
OUTPUT_DIR = "behavioral_exploration"

# Optional: select specific features to highlight
SELECTED_FEATURES = ["feature1", "feature2", "feature3"]

# Run the explorer
%run -i behavioral_space_explorer.py

The explorer generates:

{name}_behavioral_spaces.npy - All behavioral transformations
{name}_behave_{space}_{label}.png - PCA plots colored by each label
{name}_hopkins_statistics.csv - Clustering tendency metrics
{name}_clustering_statistics.csv - Variance explained, pairwise distances
{name}_outliers_{space}.csv - Outlier samples for each space

Behavioral Region Explorer

For targeted analysis of specific regions in behavioral space:

# Basic configuration
DATASET_NAME = "mydata"
DATA_FILE = "mydata.csv"
ID_COLUMN = "sample_id"
DROP_COLUMNS = ["col_a", "col_b"]
LABEL_COLUMNS = ["target1", "target2", "category"]
OUTPUT_DIR = "behavioral_exploration"
BEHAVIORAL_SPACES_FILE = "behavioral_exploration/mydata_behavioral_spaces.npy"
PLOT_MODE = "combined"  # or "separate"

# Define regions of interest in PCA space
USER_REGIONS = {
    "high_variance_cluster": {
        "space": "variance",
        "pc1_range": (0.3, 0.6),
        "pc2_range": (-0.2, 0.2),
        "description": "High variance contributors",
        "color": "red"
    },
    "entropy_outliers": {
        "space": "entropy",
        "pc1_range": (-0.5, -0.2),
        "pc2_range": (0.1, 0.4),
        "description": "Low entropy samples",
        "color": "blue"
    }
}

# Run the region explorer
%run -i behavioral_region_explorer.py

Understanding Behavioral Spaces

Variance Space: Decomposes how each sample contributes to feature spread. Negative values indicate stabilizers (typical samples near the mean). Positive values indicate stretchers (extreme samples widening distribution). Use case: quality control, identifying process instability.

Skewness Space: Decomposes how each sample contributes to distributional asymmetry. Negative values pull distribution below mean. Positive values pull distribution above mean. Near-zero values maintain symmetry. Use case: detecting biased synthesis, directional process drift.

Kurtosis Space: Decomposes how each sample contributes to tail heaviness. Negative values indicate core samples (predictable, well-behaved). Positive values indicate tail samples (rare extreme events). Use case: risk assessment, anomaly detection, reliability analysis.

Entropy Space: Decomposes how each sample contributes to information content. Positive values indicate high-information samples (rare, unique feature combinations). Negative values indicate low-information samples (common, redundant). Use case: dataset curation, experimental design, diversity quantification.

Hopkins Statistic

The Hopkins statistic H measures clustering tendency:

H > 0.7: Strong clustering (samples group by behavior)
H approximately 0.5: Random distribution (no natural grouping)
H < 0.3: Regular/uniform distribution

Parameter Selection

n_permutations: Controls Monte Carlo estimation accuracy.

50-100: Quick exploration, debugging
200-500: Standard analysis
1000+: Publication, final results

n_jobs: Parallel processing for feature columns.

-1: Use all available CPU cores
1: Single-threaded (for debugging)
N: Use N cores

random_state: Set for reproducibility. The implementation uses antithetic sampling for variance reduction.

API Reference

ShapleyBehaviors class:

sb = ShapleyBehaviors(n_permutations=100, n_jobs=-1, random_state=42)
Phi = sb.transform(X, value_function='variance', verbose=True)
spaces = sb.transform_multiple(X, value_functions=['variance', 'skewness', 'kurtosis', 'entropy'])

identify_outliers function:

outlier_indices, outlier_scores = identify_outliers(Phi, threshold=3.0, method='zscore')

Convenience functions:

from shapley_behaviors import (
    compute_shapley_variance,
    compute_shapley_skewness,
    compute_shapley_kurtosis,
    compute_shapley_entropy
)

Phi = compute_shapley_variance(X, n_permutations=100, n_jobs=-1, random_state=42)

Runtime Estimates

500 samples, 100 permutations: 2-5 minutes
500 samples, 1000 permutations: 15-30 minutes
4000 samples, 100 permutations: 20-30 minutes
4000 samples, 1000 permutations: 2-3 hours

Troubleshooting

ImportError: Ensure package is installed with pip install shapley_behaviors

Long runtime: Reduce N_PERMUTATIONS to 100 for testing

Memory error: Reduce N_JOBS or process data in batches

High additivity error warning: Increase n_permutations

No clustering detected (H approximately 0.5): Data may lack natural behavioral groupings

Citation

If you use this package, please cite:

Liu, T., and Barnard, A. S. (2025). Understanding interpretable patterns of
Shapley behaviours in materials data. Machine Learning: Engineering,
1, 015004. https://doi.org/10.1088/3049-4761/adaaf6

License

MIT License. See LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

May 5, 2026

This version

0.1.1

Feb 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

shapley_behaviors-0.1.1.tar.gz (26.9 kB view details)

Uploaded Feb 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

shapley_behaviors-0.1.1-py3-none-any.whl (24.9 kB view details)

Uploaded Feb 9, 2026 Python 3

File details

Details for the file shapley_behaviors-0.1.1.tar.gz.

File metadata

Download URL: shapley_behaviors-0.1.1.tar.gz
Upload date: Feb 9, 2026
Size: 26.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shapley_behaviors-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`daf29837876a6c6cee6e16f62fcc969aa17da0893edb3a1a3f2716fe30ef38d6`
MD5	`10bf2535c6b4716f5226c0af1f2cf6e0`
BLAKE2b-256	`71b7e66a503af23f16a49de3b277170a01122cfecb0089d8edfb6ac89c750c39`

See more details on using hashes here.

File details

Details for the file shapley_behaviors-0.1.1-py3-none-any.whl.

File metadata

Download URL: shapley_behaviors-0.1.1-py3-none-any.whl
Upload date: Feb 9, 2026
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for shapley_behaviors-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`20f37e7c1e06661f9c5dc9529c0620327c8c4076f3c23363e541dba4847226a8`
MD5	`779e5ccfaf6fe96d03d67ed9f0db5cbc`
BLAKE2b-256	`4c5d773a0ad4bd05db91a9809e73ebb74c4c96511dae56dfe4db0443b8579c08`

See more details on using hashes here.

shapley-behaviors 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

shapley_behaviors

Overview

Installation

Quick Start

Outlier Detection

Getting the Explorer Scripts

Behavioral Space Explorer

Behavioral Region Explorer

Understanding Behavioral Spaces

Hopkins Statistic

Parameter Selection

API Reference

Runtime Estimates

Troubleshooting

Citation

License

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes