A package for adaptive resampling of datasets using border detection.
Project description
This repo provides a single function which is the implementation of the border and core detection as studied in our paper:
The border and core detection function has a prototype as follows.
def classify_border_and_core_points(X, y=None, p=2, close=100, percentile=60):
"""
Classify points as 'border' or 'core' based on distance percentile, using efficient distance computation.
Parameters:
X : np.ndarray
The dataset (n_samples, n_features).
y : np.ndarray, optional
The class labels for the dataset (n_samples,). If None, the function treats all data as one class.
p : int
The norm to use for distance calculation (default is Euclidean norm, p=2).
close : int
The number of closest points to consider for the distance calculation (default=100).
percentile : float
The threshold percentile for defining border points (default=60).
Returns:
result : dict or tuple
If y is provided, returns a dictionary with class labels as keys and (border_points, core_points) as values.
If y is None, returns a tuple (border_points, core_points).
"""
Installation
The code can be installed as a python package from PyPI.
pip install adaptive_resampling
Or it can be install from the github repo directly as a package.
pip install git+https://github.com/ykahalan/adaptive_resampling.git
Example usage
For detecting core and border points of a single class.
# Import the functions from the installed library
from adaptive_resampling import classify_border_and_core_points
# Generate random data points for the example (1000 points in 2D space)
import numpy as np
X = np.random.rand(1000, 2)
# Classify points into border and core points
border_points, core_points = classify_border_and_core_points(X, p=2, close=100, percentile=60)
print(f"Number of border points: {border_points.shape[0]}")
print(f"Number of core points: {core_points.shape[0]}")
For detecting core and border points of each class.
# Import the functions from the installed library
from adaptive_resampling import classify_border_and_core_points
# Generate random data points for the example (1000 points in 2D space, with 3 classes)
import numpy as np
np.random.seed(42)
X = np.random.rand(1000, 2)
y = np.random.randint(0, 3, size=1000) # 3 classes (0, 1, 2)
# Classify border and core points for each class
class_border_core = classify_border_and_core_points(X, y, p=2, close=100, percentile=60)
for cls, (border, core) in class_border_core.items():
print(f"Class {cls}:")
print(f" Number of border points: {border.shape[0]}")
print(f" Number of core points: {core.shape[0]}")
For oversampling on the border and undersampling on the core as intended in the paper.
# Import necessary libraries
import numpy as np
from adaptive_resampling import classify_border_and_core_points
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from collections import Counter
# Generate synthetic data (1000 samples, 2 features, 3 classes)
np.random.seed(42)
X = np.random.rand(1000, 2)
y = np.random.randint(0, 3, size=1000) # Classes: 0, 1, 2
# Classify border and core points for each class
class_border_core = classify_border_and_core_points(X, y, p=2, close=100, percentile=60)
# Separate border and core points
border_points = []
border_labels = []
core_points = []
core_labels = []
for cls, (border, core) in class_border_core.items():
border_points.append(border)
border_labels.append(np.full(border.shape[0], cls)) # Store labels for border points
core_points.append(core)
core_labels.append(np.full(core.shape[0], cls)) # Store labels for core points
# Combine all border points and labels
X_border_all = np.vstack(border_points)
y_border_all = np.hstack(border_labels)
# Combine all core points and labels
X_core_all = np.vstack(core_points)
y_core_all = np.hstack(core_labels)
# Apply SMOTE to all border points across all classes
if len(np.unique(y_border_all)) > 1: # Ensure multiple classes exist
smote = SMOTE(sampling_strategy='auto', random_state=42)
X_border_resampled, y_border_resampled = smote.fit_resample(X_border_all, y_border_all)
else:
X_border_resampled, y_border_resampled = X_border_all, y_border_all # Use original if SMOTE isn't possible
# Apply Random Undersampling (RUS) to all core points across all classes
if len(np.unique(y_core_all)) > 1: # Ensure multiple classes exist
rus = RandomUnderSampler(sampling_strategy='auto', random_state=42)
X_core_resampled, y_core_resampled = rus.fit_resample(X_core_all, y_core_all)
else:
X_core_resampled, y_core_resampled = X_core_all, y_core_all # Use original if RUS isn't possible
# Combine resampled border and core points
X_resampled = np.vstack((X_border_resampled, X_core_resampled))
y_resampled = np.hstack((y_border_resampled, y_core_resampled))
# Display the class distribution before and after resampling
print(f"Original class distribution: {Counter(y)}")
print(f"Resampled class distribution: {Counter(y_resampled)}")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adaptive_resampling-0.1.1.tar.gz.
File metadata
- Download URL: adaptive_resampling-0.1.1.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0d40ae5519c8a921901f4812c7f20c21f2cd8a5296d32c969d5d26423cfd5b1
|
|
| MD5 |
70ef9c2e9a3d0df5b02a5fa74a5b10d6
|
|
| BLAKE2b-256 |
33ac3ec3ee40efc72b88f4180c88173748feb5fad8d840f3c09a7e0b57b04d87
|
File details
Details for the file adaptive_resampling-0.1.1-py3-none-any.whl.
File metadata
- Download URL: adaptive_resampling-0.1.1-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b036fd5d07cac052d925f88578d3dcefe2f634ced67b9cbd3259a49d39882d19
|
|
| MD5 |
79b9557472bf772e687ded087d1b4e92
|
|
| BLAKE2b-256 |
609864214164032ac64fdbe941261e4d7283a49525c413288e028d26f4fbf7e5
|