Skip to main content

Similarity-Based Stratified Splitting Algorithm

Project description

Similarity Stratified Split

Implementation of the Similarity-Based Stratified Splitting algorithm described in Similarity Based Stratified Splitting: an approach to train better classifiers.

Overview

The authors propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split a dataset. Splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications.

Install

Local

git clone https://github.com/timothyckl/similarity-stratified-split.git
cd ./similarity-stratified-split
pip install -e .

Usage

import numpy as np
from scipy.spatial import distance
from sbss import SimilarityStratifiedSplit

def get_distances(x):
    distances = distance.squareform(distance.pdist(x, metric='euclidean'))
    return distances

# inputs are recommended to be normalized
X = np.random.rand(1000, 128)
y = np.random.randint(0, 10, (1000,))

n_splits = 3
s = SimilarityStratifiedSplit(n_splits, get_distances)

for train_index, test_index in s.split(X, y):
  print(f"Train indices: {train_index}\nTest indices: {test_index}")
  print("="*100)

References

  • Farias, F., Ludermir, T. and Bastos-Filho, C. (2020) Similarity based stratified splitting: An approach to train better classifiers, arXiv.org. Available at: https://arxiv.org/abs/2010.06099 (Accessed: 27 November 2023).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sbss-0.0.1.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

sbss-0.0.1-py3-none-any.whl (2.7 kB view details)

Uploaded Python 3

File details

Details for the file sbss-0.0.1.tar.gz.

File metadata

  • Download URL: sbss-0.0.1.tar.gz
  • Upload date:
  • Size: 3.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for sbss-0.0.1.tar.gz
Algorithm Hash digest
SHA256 564aaece8139dbee07fcc4af75476b64eedb269472b8fae65064c88d8c8e51ca
MD5 ad80b7293852617c0b8200dafd7ca587
BLAKE2b-256 13f0b803da9a1525671401caad46c6c116712b341488b59d764a61a7a637ff9a

See more details on using hashes here.

File details

Details for the file sbss-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: sbss-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 2.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 colorama/0.4.4 importlib-metadata/4.6.4 keyring/23.5.0 pkginfo/1.8.2 readme-renderer/34.0 requests-toolbelt/0.9.1 requests/2.25.1 rfc3986/1.5.0 tqdm/4.57.0 urllib3/1.26.5 CPython/3.10.12

File hashes

Hashes for sbss-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 41d5409eb1ea4d2729d07679f679dec262b9981d0944eeecf7d7fe1c351fe543
MD5 c98ba5348934e8fe98a09b520b2f61b4
BLAKE2b-256 d65d4a911a9b7319d3e3c8bc66e47e36e3a2a510fb148a4151fa9f4178120d8b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page