Skip to main content

NHOP metric, geometry-preserving seed selection, and circular oversampling (GVM-CO, LRE-CO, LS-CO)

Project description

circover

NHOP metric, geometry-preserving seed selection, and circular oversampling for imbalanced classification.

From the thesis: "From Distributional Similarity to Causal Imbalance: NHOP, Circular Oversampling, and a Controlled Degradation Study" — Parsa Hajiannejad, Università degli Studi di Milano, 2025.

Install

pip install circover

Quick start

import circover as cc

# NHOP: measure how faithfully synthetic data reproduces the original distribution
nhop = cc.NHOP(n_bins=30)
nhop.score(X_original, X_synthetic)           # scalar in [0, 1]
nhop.score_per_feature(X_original, X_synth)   # per-feature array
nhop.tv_per_feature(X_original, X_synth)      # TV distance = 1 - NHOP

# Geometry-preserving seed selection
selector = cc.GeometricSeedSelector(n_seeds=20, random_state=42)
seed_indices, score = selector.select(X_minority)

# Circular oversamplers — drop-in replacements for SMOTE
from imblearn.pipeline import Pipeline
from sklearn.ensemble import RandomForestClassifier

pipe = Pipeline([
    ("over", cc.GVMCO(random_state=42)),   # or LRECO, LSCO
    ("clf",  RandomForestClassifier()),
])
pipe.fit(X_train, y_train)

Algorithms

Class Algorithm Description
NHOP Normalised Histogram Overlap Percentage metric
GeometricSeedSelector Alg. 2 Geometry-preserving seed selection (NHOP + AGTP + JSD + Z)
GVMCO Alg. 1 Gravity-biased Von Mises Circular Oversampling
LRECO Alg. 2 Local Region Estimation Circular Oversampling (Voronoi-constrained)
LSCO Alg. 3 Layered Segmental Circular Oversampling

All oversamplers are compatible with imbalanced-learn pipelines and sklearn cross-validation.

Key parameters

cc.GVMCO(
    n_clusters=5,       # K-Means clusters on minority class
    k_neighbors=5,      # k-NN graph for circle formation
    kappa_max=4.0,      # max Von Mises concentration
    use_pca=True,       # False = native-dimension mode
    random_state=42,
)

cc.NHOP(n_bins=30)      # histogram bins B (default 30, stable range: 20-50)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

circover-0.1.0.tar.gz (10.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

circover-0.1.0-py3-none-any.whl (12.1 kB view details)

Uploaded Python 3

File details

Details for the file circover-0.1.0.tar.gz.

File metadata

  • Download URL: circover-0.1.0.tar.gz
  • Upload date:
  • Size: 10.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for circover-0.1.0.tar.gz
Algorithm Hash digest
SHA256 91d504f5dd049197a202fac509e2ffec6d50bb92fba7e406b16c385ed9b4ea92
MD5 5b357303329e01c88366664d77e4cb67
BLAKE2b-256 985142544d0a56b5a260837b9772a9caedb92dd5abb8de7827386e634eca3bf1

See more details on using hashes here.

File details

Details for the file circover-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: circover-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for circover-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 57a0606fec5e4d24d04b74e57b3f6267dab66c4aeb2c2e626134a4035030dbdd
MD5 fb8552dd3195a066395ed0e6bb84eb65
BLAKE2b-256 c3b435110dbf883ed8124e4abb407a0df97348d604a587342cece17102763507

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page