Skip to main content

Similarity-Based Stratified Splitting Algorithm

Project description

Similarity Stratified Split

Implementation of the Similarity-Based Stratified Splitting algorithm described in Similarity Based Stratified Splitting: an approach to train better classifiers.

Overview

The authors propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split a dataset. Splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications.

Install

PyPI

pip install sbss

Local

git clone https://github.com/timothyckl/similarity-stratified-split.git
cd ./similarity-stratified-split
pip install -e .

Usage

import numpy as np
from scipy.spatial import distance
from sbss import SimilarityStratifiedSplit

def get_distances(x):
    distances = distance.squareform(distance.pdist(x, metric='euclidean'))
    return distances

# inputs are recommended to be normalized
X = np.random.rand(1000, 128)
y = np.random.randint(0, 10, (1000,))

n_splits = 3
s = SimilarityStratifiedSplit(n_splits, dist_func=get_distances)

for train_index, test_index in s.split(X, y):
  print(f"Train indices: {train_index}\nTest indices: {test_index}")
  print("="*100)

References

  • Farias, F., Ludermir, T. and Bastos-Filho, C. (2020) Similarity based stratified splitting: An approach to train better classifiers, arXiv.org. Available at: https://arxiv.org/abs/2010.06099 (Accessed: 27 November 2023).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbss-0.0.2.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sbss-0.0.2-py3-none-any.whl (2.9 kB view details)

Uploaded Python 3

File details

Details for the file sbss-0.0.2.tar.gz.

File metadata

  • Download URL: sbss-0.0.2.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sbss-0.0.2.tar.gz
Algorithm Hash digest
SHA256 3aea825b1e12aaa57c3c144129c8f2fb44a2fd7932a168f353cb45f78c061160
MD5 5eaabcf41aeb9a0bcf37f1e92f73c4c3
BLAKE2b-256 96942de1c0d2506117100919e11b623f5dcc5e93662847f6ae19714f1b7f2d0b

See more details on using hashes here.

File details

Details for the file sbss-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: sbss-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 2.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for sbss-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3c424f2e783235f415425be7f38e3917f97cc3fd05db16ee0b702bad46530791
MD5 4e93baac2f1983b1ace1faac8b207537
BLAKE2b-256 bdad803bbc3fa6c1311a17ff73cde185810cebce4d475cb8b4ffc4e3384d5cea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page