SOAK splitting utility
Project description
SOAK: Same/Other/All K-fold Cross-Validation
SOAK is designed to estimate the similarity of patterns found across different subsets of a dataset. It extends traditional K-fold cross-validation with "Same," "Other," and "All" splitting strategies to provide a robust measure of pattern similarity.
Usage
Low-level: SOAK split only
import numpy as np
import soakpy
# --- synthetic data ---
X = np.arange(10).reshape(-1, 1)
X = np.append(X, [10, 12, 14])
y = X.ravel()
subset_vec = np.array(['even' if x % 2 == 0 else 'odd' for x in X.ravel()])
# --- Initialize soak object ---
for subset_value, category, fold_id, random_seed, train_idx_final, test_same_idx in soakpy.split(subset_vec, n_splits=2, n_random_seeds=2):
print(f"test subset: {subset_value:6s} --- category: {category:6s} --- test fold: {fold_id}")
print(f"y_test : {y[test_same_idx]}")
print(f"y_train: {y[train_idx_final]}")
print("-"*50)
test subset: even --- category: same --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: even --- category: other --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [1 9]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 14]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: all --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 1
y_test : [3 5 7]
y_train: [1 9]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 1
y_test : [3 5 7]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 1
y_test : [3 5 7]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [8 9]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: same --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: even --- category: other --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [3 5 7]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 2]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 4 10]
--------------------------------------------------
test subset: even --- category: all --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 5]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 2
y_test : [1 9]
y_train: [3 5 7]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [0 4]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 7]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 5]
--------------------------------------------------
High-level: Analyze dataset and Visualize
import soakpy
import pandas as pd
df = pd.read_csv("https://github.com/lamtung16/soak_regression/raw/refs/heads/main/data/WorkersCompensation.csv.xz")
soak_obj = soakpy.SOAK(df=df, subset_col="Gender", target_col="UltimateIncurredClaimCost")
soak_obj.analyze(model_list=["featureless", "tree"], n_splits=2, n_random_seeds=2, log_target=True)
soak_obj.visualize(subset_value='M', model="tree", metric="rmse", figsize=(12, 2.5))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soakpy-0.0.52.tar.gz
(7.7 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soakpy-0.0.52.tar.gz.
File metadata
- Download URL: soakpy-0.0.52.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a05e70ab979d16d5dd1e192e231ee7e66d7847b5cc55d0488fafa3e7954c24b
|
|
| MD5 |
9f1fe30d7ece2a1726b558735fb3d472
|
|
| BLAKE2b-256 |
385da9930ddee73ac542c42f2d80ba272eefe070717da87dc06937b157112f14
|
File details
Details for the file soakpy-0.0.52-py3-none-any.whl.
File metadata
- Download URL: soakpy-0.0.52-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a07abbeb8be1ad4a84c5de1549f7a0a2897e00af5eceede0f3c29aacea6f0ca
|
|
| MD5 |
3da33645973da5240a892efb94d60bda
|
|
| BLAKE2b-256 |
1c15e7bdffee271210d2ee07d15482756187124f4803ea2c8569ece6bd85f007
|