Skip to main content

A simple package to merge one-dimension data by unsupervised method

Project description

usmerge logo

Unsupervised Merge

PyPI version Python versions License Downloads GitHub last commit

A simple Python package for one-dimensional data clustering, implementing various clustering algorithms including traditional and novel approaches.

Installation

Install the package using pip:

pip install usmerge

Features

This package provides multiple one-dimensional clustering methods:

  • Equal Width Binning (equal_wid_merge)
  • Equal Frequency Binning (equal_fre_merge)
  • K-means Clustering (kmeans_merge)
  • SOM-K Clustering (som_k_merge)
  • Fuzzy C-Means (fcm_merge)
  • Kernel Density Based (kernel_density_merge)
  • Information Theoretic (information_merge)
  • Gaussian Mixture (gaussian_mixture_merge)
  • Hierarchical Density (hierarchical_density_merge)

Usage

Data Format

The package accepts various input formats:

  • pandas Series/DataFrame
  • numpy array
  • Python list/tuple
  • Any iterable of numbers

Basic Usage Examples

  1. Equal Width Binning:
from usmerge import equal_wid_merge
labels, edges = equal_wid_merge(data, n=3)
  1. Equal Frequency Binning:
from usmerge import equal_fre_merge
labels, edges = equal_fre_merge(data, n=3)
  1. K-means Clustering:
from usmerge import kmeans_merge
labels, edges = kmeans_merge(data, n=3, max_iter=100)

Advanced Usage

  1. SOM-K Clustering:
from usmerge import som_k_merge
labels, edges = som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000)
  1. Fuzzy C-Means:
from usmerge import fcm_merge
labels, edges = fcm_merge(data, n=3, m=2.0, max_iter=100, epsilon=1e-6)
  1. Kernel Density Based:
from usmerge import kernel_density_merge
labels, edges = kernel_density_merge(data, n=3, bandwidth=None)

Return Values

All clustering methods return two values:

  • labels: List of cluster labels for each data point
  • edges: List of cluster boundaries

Example Analysis

import numpy as np
import matplotlib.pyplot as plt
from usmerge import som_k_merge, fcm_merge, kmeans_merge, hierarchical_density_merge

# Generate synthetic data with three clear clusters
np.random.seed(42)
data = np.concatenate([
    np.random.normal(0, 0.3, 50),    # First cluster
    np.random.normal(5, 0.4, 50),    # Second cluster
    np.random.normal(10, 0.3, 50)    # Third cluster
])

# Compare different clustering methods
methods = {
    'SOM-K': som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000),
    'FCM': fcm_merge(data, n=3, m=2.0, max_iter=100),
    'K-means': kmeans_merge(data, n=3),
    'Hierarchical Density': hierarchical_density_merge(data, n=3)
}

# Visualize results
plt.figure(figsize=(15, 4))
for i, (name, (labels, edges)) in enumerate(methods.items(), 1):
    plt.subplot(1, 4, i)
    plt.scatter(data, np.zeros_like(data), c=labels, cmap='viridis')
    plt.title(f'{name} Clustering')
    # Plot cluster boundaries
    for edge in edges:
        plt.axvline(x=edge, color='r', linestyle='--', alpha=0.5)
    plt.ylim(-0.5, 0.5)

plt.tight_layout()
plt.show()

Parameters Guide

Each clustering method has its own set of parameters:

  • SOM-K: sigma (neighborhood size), learning_rate (learning rate), epochs (iterations)
  • FCM: m (fuzziness), max_iter, epsilon (convergence threshold)
  • Kernel Density: bandwidth (kernel width)
  • Information Theoretic: alpha (compression-accuracy trade-off)
  • Gaussian Mixture: max_iter, epsilon (convergence threshold)
  • Hierarchical Density: min_cluster_size (minimum points per cluster)

Contributing

Feel free to contribute to this project by submitting issues or pull requests.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usmerge-0.2.0.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

usmerge-0.2.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file usmerge-0.2.0.tar.gz.

File metadata

  • Download URL: usmerge-0.2.0.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for usmerge-0.2.0.tar.gz
Algorithm Hash digest
SHA256 80c4d90e189e22e2a17a146acaae23940756ea69db5f2ec03e1c9da94ea1b68f
MD5 2a2762e0fe51a76fd26a1986b2758802
BLAKE2b-256 e07822acb01d9dfce68cdf6c160e92ed2667e011aa6bc7c716645d4e44bc75cd

See more details on using hashes here.

File details

Details for the file usmerge-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: usmerge-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for usmerge-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9b46bdee52f1fc3787e78d4ae79dc4cfc774bf44f3c359312e47790f32e6d4cb
MD5 598b8b3339815d125a05d3a3c9901d64
BLAKE2b-256 085544cade8f39f0e60de31d7a21a4fd0e68b79a274b7aa5e95ded85939358d5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page