Skip to main content

Supervised clustering

Project description

rh

Supervised clustering

To install: pip install rh

Overview

The rh package provides tools for supervised clustering, integrating label information into the clustering process. This is particularly useful when you have labeled data and you want to ensure that the clusters are formed considering these labels. The package includes classes and functions that extend the functionality of scikit-learn's clustering algorithms, allowing for more nuanced clustering strategies that can be tailored based on the characteristics of the data and the labels.

Main Features

  • SupervisedKMeans: A class that performs K-Means clustering in a supervised manner, where the number of clusters per class can vary based on the distribution of data points among the classes.
  • SeperateClassKMeans: A class that fits a separate KMeans clustering to each class found in the dataset, with options to adjust the number of clusters per class based on different strategies such as volume (number of points) or inertia (within-cluster sum of squares).
  • Utility Functions: Functions like _choose_class_weights and _choose_distribution_according_to_weights help in determining the number of clusters for each class based on specified criteria.

Installation

You can install the rh package using pip:

pip install rh

Usage Examples

SupervisedKMeans

from rh import SupervisedKMeans
import numpy as np

# Example data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])
y = np.array([0, 0, 0, 1, 1, 1])

# Initialize and fit the model
model = SupervisedKMeans(n_clusters=2)
model.fit(X, y)

# Predict new data
print(model.predict(np.array([[1, 1], [10, 3]])))

SeperateClassKMeans

from rh import SeperateClassKMeans
import numpy as np

# Example data
X = np.array([[1, 2], [1, 4], [1, 0],
              [10, 2], [10, 4], [10, 0]])
y = np.array([0, 0, 0, 1, 1, 1])

# Initialize and fit the model
model = SeperateClassKMeans(n_clusters=2, method='volume')
model.fit(X, y)

# Access cluster centers
print(model.cluster_centers_)

Documentation

Classes

  • SupervisedKMeans: Clusters data by first fitting a KMeans model to the most frequent classes until the number of clusters is exhausted. It then assigns new data points to these clusters or to the nearest cluster if the class was not seen during training.
  • SeperateClassKMeans: Fits a separate KMeans model to each class in the dataset. The number of clusters for each class can be specified or automatically determined based on the method chosen (volume or inertia).

Functions

  • y_idx_dict(y): Creates a dictionary mapping each unique label in y to the indices of samples with that label.
  • kmeans_per_y_dict(X, y, y_idx=None): Fits a KMeans model to the data points in X corresponding to each unique label in y.
  • _choose_class_weights(X, y, n_clusters, method, clusterer): Determines the number of clusters for each class based on the specified method.
  • _choose_distribution_according_to_weights(weights, total_int_to_distribute): Distributes a total integer amount proportionally across items based on their weights.

Contributing

Contributions to the rh package are welcome. Please ensure that any pull requests or issues are relevant to supervised clustering enhancements or bug fixes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rh-0.0.5.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rh-0.0.5-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file rh-0.0.5.tar.gz.

File metadata

  • Download URL: rh-0.0.5.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for rh-0.0.5.tar.gz
Algorithm Hash digest
SHA256 f3ddcf6cfd58b2ed2eb54f7aaadc56b75c45623d20d47319af99b91cc1e06344
MD5 fe855da82f2acc2c468d4c49eded2b18
BLAKE2b-256 c46112f9998627393bdc032588da3b2d9f27100e2cc34dbd36b5bc3c8f6d428d

See more details on using hashes here.

File details

Details for the file rh-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: rh-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 9.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for rh-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 422e21cefe4caa5cbd87cfd7793ac816f1836c41129b94d3f6a1773a79ac8b9a
MD5 b46177e26d26c5a64012b3336bf05c8c
BLAKE2b-256 12d14c478dc45ff71d94397756b9c6e33eb1347b71e1d422ec3d43874b48e74b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page