Techniques for handling class overlapping with complexity measures
Project description
FairSample
Fair sampling for imbalanced datasets with 14+ resampling techniques and 40+ complexity measures.
Why FairSample?
Most imbalanced learning packages only provide resampling techniques. FairSample adds complexity measures to help you understand why your dataset is difficult and which technique works best.
Installation
pip install fairsample
Quick Start
from fairsample import RFCL
from fairsample.complexity import ComplexityMeasures
import pandas as pd
# Load data
df = pd.read_csv('data.csv')
X = df.drop('target', axis=1)
y = df['target']
# Check complexity
cm = ComplexityMeasures(X, y)
complexity = cm.analyze_overlap()
print(f"Overlap (N3): {complexity['N3']:.4f}")
# Apply resampling
sampler = RFCL(random_state=42)
X_resampled, y_resampled = sampler.fit_resample(X, y)
# Use resampled data
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_resampled, y_resampled)
Features
14+ Resampling Techniques:
- RFCL, NUS, URNS - Overlap-based undersampling
- SVDDWSMOTE, ODBOT, EHSO - Hybrid methods
- NBUS, KMeansUndersampling - Clustering-based (multiple variants)
- OSM - Comprehensive overlap handling
- RandomOverSampler, RandomUnderSampler - Baselines
40+ Complexity Measures:
- Feature Overlap: F1, F1v, F2, F3, F4, Input Noise
- Instance Overlap: N3, N4, kDN, CM, R-value, D3, SI, Borderline, Degree of Overlap
- Structural: N1, N2, T1, DBC, LSC, Clust, NSG, ICSV, ONB
- Multiresolution: Purity, Neighbourhood Separability, MRCA, C1, C2
Usage
Compare Multiple Techniques
from fairsample.utils import compare_techniques
results = compare_techniques(
X, y,
techniques=['RFCL', 'NUS', 'URNS'],
complexity_measures='basic'
)
print(results.sort_values('N3')) # Lower N3 = less overlap
Get All Complexity Measures
# All measures
all_measures = cm.get_all_complexity_measures(measures='all')
# By category
feature_measures = cm.get_all_complexity_measures(measures='feature')
# Specific measures
selected = cm.get_all_complexity_measures(measures=['N3', 'F1', 'N1'])
Compare Before/After
from fairsample.complexity import compare_pre_post_overlap
X_resampled, y_resampled = sampler.fit_resample(X, y)
comparison = compare_pre_post_overlap(X, y, X_resampled, y_resampled)
print(comparison['improvements'])
API
All techniques follow scikit-learn's API:
sampler = RFCL(random_state=42)
X_resampled, y_resampled = sampler.fit_resample(X, y)
Requirements
Python 3.8+ with numpy, scikit-learn, scipy, pandas, matplotlib, seaborn
Contributing
Contributions welcome! Submit a PR or open an issue.
License
MIT License
Citation
@software{fairsample,
author = {Mohd Uwaish},
title = {FairSample: Techniques for handling class overlapping problems},
year = {2026},
url = {https://github.com/mohdUwaish59/fairsample}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fairsample-1.0.0.tar.gz.
File metadata
- Download URL: fairsample-1.0.0.tar.gz
- Upload date:
- Size: 43.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa6864d322657530860c3291aa426b442e69745a95af488b1e1b1b3af4a28dfe
|
|
| MD5 |
57d990a1d9e8e1f16a172d7dbf62192a
|
|
| BLAKE2b-256 |
b6a076dd16c7a8ac1fd2d93cb0402226df3f793cb90085ab41247bc1a328811d
|
File details
Details for the file fairsample-1.0.0-py3-none-any.whl.
File metadata
- Download URL: fairsample-1.0.0-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2eb289b19b504c429548e83e521d9f619eb900ef73b7c36ee4e16282c1e8f291
|
|
| MD5 |
78549cfbf4bade80a2b6e6cb18bb4d3b
|
|
| BLAKE2b-256 |
6ebfee5f11d9c2948411a890b0081b1086553ac05d442dd06c70ff87d85743fe
|