Resampling based on Sample Concatenation (Re-SC) algorithms for imbalanced learning
Project description
imblearn-resc
Re-SC (Resampling based on Sample Concatenation) algorithms for imbalanced learning.
This package is fully compatible with the scikit-learn and imbalanced-learn ecosystems. It addresses class imbalance by mapping data into a higher-dimensional (2d) concatenated feature space, utilizing either density-weighted random sampling (ReSC) or K-Means clustering (KMeansReSC) to safely resample the majority and minority classes.
📦 Installation
You can install imblearn-resc directly from PyPI using pip:
pip install imblearn-resc
Requires Python >=3.11, scikit-learn >=1.4.0, and imbalanced-learn >=0.12.0
🚀 Quick Start & Usage
Because Re-SC algorithms map your original features ($d$) into a concatenated feature space ($2d$), you must always pair the Sampler with the ReSCTransformer inside an imblearn Pipeline.
- The Sampler (
ReSCorKMeansReSC) transforms the training data during.fit_resample(). - The Transformer (
ReSCTransformer) bypasses the training data, but safely duplicates the test data features ($x \rightarrow [x, x]$) during.predict()so your classifier receives the correct dimensions.
Example: Complete Pipeline
Here is a full, runnable example of how to use ReSC and KMeansReSC with a standard machine learning classifier.
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
# 1. Import the pipeline from imbalanced-learn (NOT standard sklearn!)
from imblearn.pipeline import Pipeline
# 2. Import the Re-SC Samplers and Transformer
from imblearn_resc.oversampling import ReSC, KMeansReSC
from imblearn_resc.preprocessing import ReSCTransformer
# Generate a highly imbalanced dummy dataset (10% minority, 90% majority)
X, y = make_classification(
n_classes=2, class_sep=2, weights=[0.1, 0.9],
n_informative=3, n_redundant=1, flip_y=0,
n_features=5, n_clusters_per_class=1,
n_samples=1000, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# ==========================================
# Option A: Standard ReSC Pipeline
# ==========================================
pipeline_resc = Pipeline([
('sampler', ReSC(M=1.5, k=5, random_state=42)),
('transformer', ReSCTransformer()), # <--- Mandatory!
('classifier', RandomForestClassifier(random_state=42))
])
# Train and Predict
pipeline_resc.fit(X_train, y_train)
y_pred_resc = pipeline_resc.predict(X_test)
print("ReSC Classification Report:")
print(classification_report(y_test, y_pred_resc))
# ==========================================
# Option B: KMeansReSC Pipeline
# ==========================================
pipeline_kmeans = Pipeline([
('sampler', KMeansReSC(M=1.5, num_candidates_to_test=5, random_state=42)),
('transformer', ReSCTransformer()), # <--- Mandatory!
('classifier', RandomForestClassifier(random_state=42))
])
# Train and Predict
pipeline_kmeans.fit(X_train, y_train)
y_pred_kmeans = pipeline_kmeans.predict(X_test)
print("KMeansReSC Classification Report:")
print(classification_report(y_test, y_pred_kmeans))
🧠 Key Parameters
ReSC
M(float, default=1.5): The maximum acceptable imbalance ratio threshold for the resulting dataset.k(int, default=5): Number of nearest neighbors used to calculate majority sample weights.alpha(float, default=0.05): Significance level for the Z-test used to compute the required statistical sample size.
KMeansReSC
M(float, default=1.5): The maximum acceptable imbalance ratio threshold for the resulting dataset.num_candidates_to_test(int, default=5): How many 'k' values (clusters) to test during geometric tuning using the Silhouette Score.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imblearn_resc-0.1.0.tar.gz.
File metadata
- Download URL: imblearn_resc-0.1.0.tar.gz
- Upload date:
- Size: 10.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
139fab2a1086411fa6cabc489c0a62453992e2b1e0342db531e860230b21c5a4
|
|
| MD5 |
3f6cad398b6f17a61a5cc4c003e8c763
|
|
| BLAKE2b-256 |
6bea41540cbccc3f7ce20a0808a9178996dffb56596d8eed041d49238d88ac05
|
Provenance
The following attestation bundles were made for imblearn_resc-0.1.0.tar.gz:
Publisher:
publish.yml on maksimkins/resampling-methods
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
imblearn_resc-0.1.0.tar.gz -
Subject digest:
139fab2a1086411fa6cabc489c0a62453992e2b1e0342db531e860230b21c5a4 - Sigstore transparency entry: 2019720283
- Sigstore integration time:
-
Permalink:
maksimkins/resampling-methods@323dedce09c7283d81cd8f62ab11b861561a1249 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/maksimkins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@323dedce09c7283d81cd8f62ab11b861561a1249 -
Trigger Event:
release
-
Statement type:
File details
Details for the file imblearn_resc-0.1.0-py3-none-any.whl.
File metadata
- Download URL: imblearn_resc-0.1.0-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a926b2b1ad3864e1e7ad1b4a70993bbfc1aef6ddea3d194680d80c7e446ec68
|
|
| MD5 |
c2518472ec5a204223a74a4abc87408d
|
|
| BLAKE2b-256 |
1ce1175dd748e92b2c879547edcdf51f6bc3e9d405626a39f69b6b925c9f94cd
|
Provenance
The following attestation bundles were made for imblearn_resc-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on maksimkins/resampling-methods
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
imblearn_resc-0.1.0-py3-none-any.whl -
Subject digest:
9a926b2b1ad3864e1e7ad1b4a70993bbfc1aef6ddea3d194680d80c7e446ec68 - Sigstore transparency entry: 2019720420
- Sigstore integration time:
-
Permalink:
maksimkins/resampling-methods@323dedce09c7283d81cd8f62ab11b861561a1249 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/maksimkins
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@323dedce09c7283d81cd8f62ab11b861561a1249 -
Trigger Event:
release
-
Statement type: