SOAK splitting utility
Project description
SOAK: Same/Other/All K-fold Cross-Validation
SOAK is designed to estimate the similarity of patterns found across different subsets of a dataset. It extends traditional K-fold cross-validation with "Same," "Other," and "All" splitting strategies to provide a robust measure of pattern similarity.
Usage
Low-level: SOAK split only
import numpy as np
import soakpy
# --- synthetic data ---
X = np.arange(10).reshape(-1, 1)
X = np.append(X, [10, 12, 14])
y = X.ravel()
subset_vec = np.array(['even' if x % 2 == 0 else 'odd' for x in X.ravel()])
# --- Initialize soak object ---
for subset_value, category, fold_id, random_seed, train_idx_final, test_same_idx in soakpy.split(subset_vec, n_splits=2, n_random_seeds=2):
print(f"test subset: {subset_value:6s} --- category: {category:6s} --- test fold: {fold_id}")
print(f"y_test : {y[test_same_idx]}")
print(f"y_train: {y[train_idx_final]}")
print("-"*50)
test subset: even --- category: same --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: even --- category: other --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [1 9]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 14]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: all --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 1
y_test : [3 5 7]
y_train: [1 9]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 1
y_test : [3 5 7]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 1
y_test : [3 5 7]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [8 9]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: same --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: even --- category: other --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [3 5 7]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 2]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 4 10]
--------------------------------------------------
test subset: even --- category: all --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 5]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 2
y_test : [1 9]
y_train: [3 5 7]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [0 4]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 7]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 5]
--------------------------------------------------
High-level: Analyze dataset and Visualize
import soakpy
import pandas as pd
df = pd.read_csv("https://github.com/lamtung16/soak_regression/raw/refs/heads/main/data/WorkersCompensation.csv.xz")
soak_obj = soakpy.SOAK(df=df, subset_col="Gender", target_col="UltimateIncurredClaimCost")
soak_obj.analyze(model_list=["featureless", "tree"], n_splits=2, n_random_seeds=2, log_target=True)
soak_obj.visualize(subset_value='M', model="tree", metric="rmse", figsize=(12, 2.5))
soak_obj.visualize(subset_value='F', model="featureless", metric="mae", figsize=(12, 2.5))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soakpy-0.0.53.tar.gz
(8.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soakpy-0.0.53.tar.gz.
File metadata
- Download URL: soakpy-0.0.53.tar.gz
- Upload date:
- Size: 8.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
245f075f0b4fbd8eff6f215c2a9b241d192775ed508242bbf8fbb4866fb4e8ad
|
|
| MD5 |
ce0c2bee7195eec3898b18f88b72faa1
|
|
| BLAKE2b-256 |
cea3067852868a84518233cb225dfcc5eba70f35a960a24bf494a3138d6a9921
|
File details
Details for the file soakpy-0.0.53-py3-none-any.whl.
File metadata
- Download URL: soakpy-0.0.53-py3-none-any.whl
- Upload date:
- Size: 8.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f265c909b2a146d9c25bf8ee81096c6544e850e2b9c0aaf393e61e481610514
|
|
| MD5 |
1a22051bec12e0b67de3432dc3363610
|
|
| BLAKE2b-256 |
6025b51b529672cc24c52237f28202305cc19f8deb928184a134cb9570de47db
|