Skip to main content

A package for adaptive resampling of datasets using border detection.

Project description

This repo provides a single function which is the implementation of the border and core detection as studied in our paper:

The border and core detection function has a prototype as follows.

def classify_border_and_core_points(X, y=None, p=2, close=100, percentile=60):
    """
    Classify points as 'border' or 'core' based on distance percentile, using efficient distance computation.
    
    Parameters:
    X : np.ndarray
        The dataset (n_samples, n_features).
    y : np.ndarray, optional
        The class labels for the dataset (n_samples,). If None, the function treats all data as one class.
    p : int
        The norm to use for distance calculation (default is Euclidean norm, p=2).
    close : int
        The number of closest points to consider for the distance calculation (default=100).
    percentile : float
        The threshold percentile for defining border points (default=60).
    
    Returns:
    result : dict or tuple
        If y is provided, returns a dictionary with class labels as keys and (border_points, core_points) as values.
        If y is None, returns a tuple (border_points, core_points).
    """

Installation

The code can be installed as a python package from PyPI.

pip install adaptive_resampling

Or it can be install from the github repo directly as a package.

pip install git+https://github.com/ykahalan/adaptive_resampling.git

Example usage

For detecting core and border points of a single class.

# Import the functions from the installed library
from adaptive_resampling import classify_border_and_core_points

# Generate random data points for the example (1000 points in 2D space)
import numpy as np
X = np.random.rand(1000, 2)

# Classify points into border and core points
border_points, core_points = classify_border_and_core_points(X, p=2, close=100, percentile=60)

print(f"Number of border points: {border_points.shape[0]}")
print(f"Number of core points: {core_points.shape[0]}")

For detecting core and border points of each class.

# Import the functions from the installed library
from adaptive_resampling import classify_border_and_core_points

# Generate random data points for the example (1000 points in 2D space, with 3 classes)
import numpy as np
np.random.seed(42)
X = np.random.rand(1000, 2)
y = np.random.randint(0, 3, size=1000)  # 3 classes (0, 1, 2)

# Classify border and core points for each class
class_border_core = classify_border_and_core_points(X, y, p=2, close=100, percentile=60)

for cls, (border, core) in class_border_core.items():
    print(f"Class {cls}:")
    print(f"  Number of border points: {border.shape[0]}")
    print(f"  Number of core points: {core.shape[0]}")

For oversampling on the border and undersampling on the core as intended in the paper.

# Import necessary libraries
import numpy as np
from adaptive_resampling import classify_border_and_core_points
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from collections import Counter

# Generate synthetic data (1000 samples, 2 features, 3 classes)
np.random.seed(42)
X = np.random.rand(1000, 2)
y = np.random.randint(0, 3, size=1000)  # Classes: 0, 1, 2

# Classify border and core points for each class
class_border_core = classify_border_and_core_points(X, y, p=2, close=100, percentile=60)

# Separate border and core points
border_points = []
border_labels = []
core_points = []
core_labels = []

for cls, (border, core) in class_border_core.items():
    border_points.append(border)
    border_labels.append(np.full(border.shape[0], cls))  # Store labels for border points
    core_points.append(core)
    core_labels.append(np.full(core.shape[0], cls))  # Store labels for core points

# Combine all border points and labels
X_border_all = np.vstack(border_points)
y_border_all = np.hstack(border_labels)

# Combine all core points and labels
X_core_all = np.vstack(core_points)
y_core_all = np.hstack(core_labels)

# Apply SMOTE to all border points across all classes
if len(np.unique(y_border_all)) > 1:  # Ensure multiple classes exist
    smote = SMOTE(sampling_strategy='auto', random_state=42)
    X_border_resampled, y_border_resampled = smote.fit_resample(X_border_all, y_border_all)
else:
    X_border_resampled, y_border_resampled = X_border_all, y_border_all  # Use original if SMOTE isn't possible

# Apply Random Undersampling (RUS) to all core points across all classes
if len(np.unique(y_core_all)) > 1:  # Ensure multiple classes exist
    rus = RandomUnderSampler(sampling_strategy='auto', random_state=42)
    X_core_resampled, y_core_resampled = rus.fit_resample(X_core_all, y_core_all)
else:
    X_core_resampled, y_core_resampled = X_core_all, y_core_all  # Use original if RUS isn't possible

# Combine resampled border and core points
X_resampled = np.vstack((X_border_resampled, X_core_resampled))
y_resampled = np.hstack((y_border_resampled, y_core_resampled))

# Display the class distribution before and after resampling
print(f"Original class distribution: {Counter(y)}")
print(f"Resampled class distribution: {Counter(y_resampled)}")

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptive_resampling-0.1.1.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adaptive_resampling-0.1.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file adaptive_resampling-0.1.1.tar.gz.

File metadata

  • Download URL: adaptive_resampling-0.1.1.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for adaptive_resampling-0.1.1.tar.gz
Algorithm Hash digest
SHA256 c0d40ae5519c8a921901f4812c7f20c21f2cd8a5296d32c969d5d26423cfd5b1
MD5 70ef9c2e9a3d0df5b02a5fa74a5b10d6
BLAKE2b-256 33ac3ec3ee40efc72b88f4180c88173748feb5fad8d840f3c09a7e0b57b04d87

See more details on using hashes here.

File details

Details for the file adaptive_resampling-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for adaptive_resampling-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b036fd5d07cac052d925f88578d3dcefe2f634ced67b9cbd3259a49d39882d19
MD5 79b9557472bf772e687ded087d1b4e92
BLAKE2b-256 609864214164032ac64fdbe941261e4d7283a49525c413288e028d26f4fbf7e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page