Similarity-Based Stratified Splitting Algorithm
Project description
Similarity Stratified Split
Implementation of the Similarity-Based Stratified Splitting algorithm described in Similarity Based Stratified Splitting: an approach to train better classifiers.
Overview
The authors propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split a dataset. Splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications.
Install
Local
git clone https://github.com/timothyckl/similarity-stratified-split.git
cd ./similarity-stratified-split
pip install -e .
Usage
import numpy as np
from scipy.spatial import distance
from sbss import SimilarityStratifiedSplit
def get_distances(x):
distances = distance.squareform(distance.pdist(x, metric='euclidean'))
return distances
# inputs are recommended to be normalized
X = np.random.rand(1000, 128)
y = np.random.randint(0, 10, (1000,))
n_splits = 3
s = SimilarityStratifiedSplit(n_splits, get_distances)
for train_index, test_index in s.split(X, y):
print(f"Train indices: {train_index}\nTest indices: {test_index}")
print("="*100)
References
- Farias, F., Ludermir, T. and Bastos-Filho, C. (2020) Similarity based stratified splitting: An approach to train better classifiers, arXiv.org. Available at: https://arxiv.org/abs/2010.06099 (Accessed: 27 November 2023).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sbss-0.0.1.tar.gz
.
File metadata
- Download URL: sbss-0.0.1.tar.gz
- Upload date:
- Size: 3.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 564aaece8139dbee07fcc4af75476b64eedb269472b8fae65064c88d8c8e51ca |
|
MD5 | ad80b7293852617c0b8200dafd7ca587 |
|
BLAKE2b-256 | 13f0b803da9a1525671401caad46c6c116712b341488b59d764a61a7a637ff9a |
File details
Details for the file sbss-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: sbss-0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 41d5409eb1ea4d2729d07679f679dec262b9981d0944eeecf7d7fe1c351fe543 |
|
MD5 | c98ba5348934e8fe98a09b520b2f61b4 |
|
BLAKE2b-256 | d65d4a911a9b7319d3e3c8bc66e47e36e3a2a510fb148a4151fa9f4178120d8b |