Skip to main content

Techniques for handling class overlapping with complexity measures

Project description

FairSample

Fair sampling for imbalanced datasets with 14+ resampling techniques and 40+ complexity measures.

Python 3.8+ License: MIT

Why FairSample?

Most imbalanced learning packages only provide resampling techniques. FairSample adds complexity measures to help you understand why your dataset is difficult and which technique works best.

Installation

pip install fairsample

Quick Start

from fairsample import RFCL
from fairsample.complexity import ComplexityMeasures
import pandas as pd

# Load data
df = pd.read_csv('data.csv')
X = df.drop('target', axis=1)
y = df['target']

# Check complexity
cm = ComplexityMeasures(X, y)
complexity = cm.analyze_overlap()
print(f"Overlap (N3): {complexity['N3']:.4f}")

# Apply resampling
sampler = RFCL(random_state=42)
X_resampled, y_resampled = sampler.fit_resample(X, y)

# Use resampled data
from sklearn.ensemble import RandomForestClassifier
clf = RandomForestClassifier()
clf.fit(X_resampled, y_resampled)

Features

14+ Resampling Techniques:

  • RFCL, NUS, URNS - Overlap-based undersampling
  • SVDDWSMOTE, ODBOT, EHSO - Hybrid methods
  • NBUS, KMeansUndersampling - Clustering-based (multiple variants)
  • OSM - Comprehensive overlap handling
  • RandomOverSampler, RandomUnderSampler - Baselines

40+ Complexity Measures:

  • Feature Overlap: F1, F1v, F2, F3, F4, Input Noise
  • Instance Overlap: N3, N4, kDN, CM, R-value, D3, SI, Borderline, Degree of Overlap
  • Structural: N1, N2, T1, DBC, LSC, Clust, NSG, ICSV, ONB
  • Multiresolution: Purity, Neighbourhood Separability, MRCA, C1, C2

Usage

Compare Multiple Techniques

from fairsample.utils import compare_techniques

results = compare_techniques(
    X, y,
    techniques=['RFCL', 'NUS', 'URNS'],
    complexity_measures='basic'
)
print(results.sort_values('N3'))  # Lower N3 = less overlap

Get All Complexity Measures

# All measures
all_measures = cm.get_all_complexity_measures(measures='all')

# By category
feature_measures = cm.get_all_complexity_measures(measures='feature')

# Specific measures
selected = cm.get_all_complexity_measures(measures=['N3', 'F1', 'N1'])

Compare Before/After

from fairsample.complexity import compare_pre_post_overlap

X_resampled, y_resampled = sampler.fit_resample(X, y)
comparison = compare_pre_post_overlap(X, y, X_resampled, y_resampled)
print(comparison['improvements'])

API

All techniques follow scikit-learn's API:

sampler = RFCL(random_state=42)
X_resampled, y_resampled = sampler.fit_resample(X, y)

Requirements

Python 3.8+ with numpy, scikit-learn, scipy, pandas, matplotlib, seaborn

Contributing

Contributions welcome! Submit a PR or open an issue.

License

MIT License

Citation

@software{fairsample,
  author = {Mohd Uwaish},
  title = {FairSample: Techniques for handling class overlapping problems},
  year = {2026},
  url = {https://github.com/mohdUwaish59/fairsample}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fairsample-1.0.0.tar.gz (43.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fairsample-1.0.0-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file fairsample-1.0.0.tar.gz.

File metadata

  • Download URL: fairsample-1.0.0.tar.gz
  • Upload date:
  • Size: 43.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fairsample-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fa6864d322657530860c3291aa426b442e69745a95af488b1e1b1b3af4a28dfe
MD5 57d990a1d9e8e1f16a172d7dbf62192a
BLAKE2b-256 b6a076dd16c7a8ac1fd2d93cb0402226df3f793cb90085ab41247bc1a328811d

See more details on using hashes here.

File details

Details for the file fairsample-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: fairsample-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 37.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for fairsample-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2eb289b19b504c429548e83e521d9f619eb900ef73b7c36ee4e16282c1e8f291
MD5 78549cfbf4bade80a2b6e6cb18bb4d3b
BLAKE2b-256 6ebfee5f11d9c2948411a890b0081b1086553ac05d442dd06c70ff87d85743fe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page