SOAK splitting utility
Project description
SOAK: Same/Other/All K-fold Cross-Validation
SOAK is designed to estimate the similarity of patterns found across different subsets of a dataset. It extends traditional K-fold cross-validation with "Same," "Other," and "All" splitting strategies to provide a robust measure of pattern similarity.
Usage
Low-level: SOAK split only
import numpy as np
import soakpy
# --- synthetic data ---
X = np.arange(10).reshape(-1, 1)
X = np.append(X, [10, 12, 14])
y = X.ravel()
subset_vec = np.array(['even' if x % 2 == 0 else 'odd' for x in X.ravel()])
# --- Initialize soak object ---
for subset_value, category, fold_id, random_seed, train_idx_final, test_same_idx in soakpy.split(subset_vec, n_splits=2, n_random_seeds=2):
print(f"test subset: {subset_value:6s} --- category: {category:6s} --- test fold: {fold_id}")
print(f"y_test : {y[test_same_idx]}")
print(f"y_train: {y[train_idx_final]}")
print("-"*50)
test subset: even --- category: same --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: even --- category: other --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [1 9]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 6 14]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: all --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [ 1 14]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 1
y_test : [ 0 2 4 10]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 1
y_test : [3 5 7]
y_train: [1 9]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 1
y_test : [3 5 7]
y_train: [ 6 8 12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [12 14]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 1
y_test : [3 5 7]
y_train: [ 1 6 8 9 12 14]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [8 9]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 1
y_test : [3 5 7]
y_train: [ 8 14]
--------------------------------------------------
test subset: even --- category: same --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: even --- category: other --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [3 5 7]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 2]
--------------------------------------------------
test subset: even --- category: same-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 4 10]
--------------------------------------------------
test subset: even --- category: all --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [0 5]
--------------------------------------------------
test subset: even --- category: all-ds --- test fold: 2
y_test : [ 6 8 12 14]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: same --- test fold: 2
y_test : [1 9]
y_train: [3 5 7]
--------------------------------------------------
test subset: odd --- category: other --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 4 10]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [0 4]
--------------------------------------------------
test subset: odd --- category: other-ds --- test fold: 2
y_test : [1 9]
y_train: [ 2 10]
--------------------------------------------------
test subset: odd --- category: all --- test fold: 2
y_test : [1 9]
y_train: [ 0 2 3 4 5 7 10]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 7]
--------------------------------------------------
test subset: odd --- category: all-ds --- test fold: 2
y_test : [1 9]
y_train: [2 5]
--------------------------------------------------
High-level: Analyze dataset and Visualize
import soakpy
import pandas as pd
df = pd.read_csv("https://github.com/lamtung16/soak_regression/raw/refs/heads/main/data/WorkersCompensation.csv.xz")
soak_obj = soakpy.SOAK(df=df, subset_col="Gender", target_col="UltimateIncurredClaimCost")
soak_obj.analyze(model_list=["featureless", "tree"], n_splits=2, n_random_seeds=2, log_target=True)
soak_obj.visualize(subset_value='M', model="tree", metric="rmse", figsize=(12, 2.5))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
soakpy-0.0.5.tar.gz
(7.6 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file soakpy-0.0.5.tar.gz.
File metadata
- Download URL: soakpy-0.0.5.tar.gz
- Upload date:
- Size: 7.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82b3ad5d79e93882ac0a4e18fd867cacfedcca0f87a445052fc81f42f0a4f8f3
|
|
| MD5 |
4b17091bffa1a61326067e274b4f7d5e
|
|
| BLAKE2b-256 |
990be071912a6be4ebc2f85b0773ada5955db627ed1a4f36ead411c3969fcd43
|
File details
Details for the file soakpy-0.0.5-py3-none-any.whl.
File metadata
- Download URL: soakpy-0.0.5-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97e7e58804a61ac854bbedf5623f6cfa30004e58f99eb64f3c9de927724a3bf2
|
|
| MD5 |
d580b0d9d3b0fdd7dd17e2d0aad086df
|
|
| BLAKE2b-256 |
030ef94a9927828cea00438d6e945bea19aa73d0f986a2893174f44bd0b3eb77
|