Skip to main content

A simple package to merge one-dimension data by unsupervised method

Project description

usmerge logo

Unsupervised Merge

PyPI version Python versions License Downloads GitHub last commit

A simple Python package for one-dimensional data clustering, implementing various clustering algorithms including traditional and novel approaches.

Installation

Install the package using pip:

pip install usmerge

Features

This package provides multiple one-dimensional clustering methods:

  • Equal Width Binning (equal_wid_merge)
  • Equal Frequency Binning (equal_fre_merge)
  • K-means Clustering (kmeans_merge)
  • SOM-K Clustering (som_k_merge)
  • Fuzzy C-Means (fcm_merge)
  • Kernel Density Based (kernel_density_merge)
  • Information Theoretic (information_merge)
  • Gaussian Mixture (gaussian_mixture_merge)
  • Hierarchical Density (hierarchical_density_merge)
  • Jenks Natural Breaks (jenks_breaks_merge)
  • Quantile-based (quantile_merge)
  • DBSCAN (dbscan_1d_merge)

Usage

Data Format

The package accepts various input formats:

  • pandas Series/DataFrame
  • numpy array
  • Python list/tuple
  • Any iterable of numbers

Basic Usage Examples

  1. Equal Width Binning:
from usmerge import equal_wid_merge
labels, edges = equal_wid_merge(data, n=3)
  1. Equal Frequency Binning:
from usmerge import equal_fre_merge
labels, edges = equal_fre_merge(data, n=3)
  1. K-means Clustering:
from usmerge import kmeans_merge
labels, edges = kmeans_merge(data, n=3, max_iter=100)

Advanced Usage

  1. SOM-K Clustering:
from usmerge import som_k_merge
labels, edges = som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000)
  1. Fuzzy C-Means:
from usmerge import fcm_merge
labels, edges = fcm_merge(data, n=3, m=2.0, max_iter=100, epsilon=1e-6)
  1. Kernel Density Based:
from usmerge import kernel_density_merge
labels, edges = kernel_density_merge(data, n=3, bandwidth=None)
  1. Jenks Natural Breaks:
from usmerge import jenks_breaks_merge
labels, edges = jenks_breaks_merge(data, n=3)
  1. Quantile-based Clustering:
from usmerge import quantile_merge
labels, edges = quantile_merge(data, n=3)
  1. DBSCAN Clustering:
from usmerge import dbscan_1d_merge
labels, edges = dbscan_1d_merge(data, n=3, min_samples=3)

Return Values

All clustering methods return two values:

  • labels: List of cluster labels for each data point
  • edges: List of cluster boundaries

Example Analysis

import numpy as np
import matplotlib.pyplot as plt
from usmerge import som_k_merge, fcm_merge, kmeans_merge, hierarchical_density_merge, dbscan_1d_merge

# Generate synthetic data with three clear clusters
np.random.seed(42)
data = np.concatenate([
    np.random.normal(0, 0.3, 50),    # First cluster
    np.random.normal(5, 0.4, 50),    # Second cluster
    np.random.normal(10, 0.3, 50)    # Third cluster
])

# Compare different clustering methods
methods = {
    'SOM-K': som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000),
    'FCM': fcm_merge(data, n=3, m=2.0, max_iter=100),
    'K-means': kmeans_merge(data, n=3),
    'DBSCAN': dbscan_1d_merge(data, n=3, min_samples=3),
    'Hierarchical Density': hierarchical_density_merge(data, n=3)
}

# Visualize results
plt.figure(figsize=(15, 5))
for i, (name, (labels, edges)) in enumerate(methods.items(), 1):
    plt.subplot(1, 5, i)
    plt.scatter(data, np.zeros_like(data), c=labels, cmap='viridis')
    plt.title(f'{name} Clustering')
    # Plot cluster boundaries
    for edge in edges:
        plt.axvline(x=edge, color='r', linestyle='--', alpha=0.5)
    plt.ylim(-0.5, 0.5)

plt.tight_layout()
plt.show()

Parameters Guide

Each clustering method has its own set of parameters:

  • SOM-K: sigma (neighborhood size), learning_rate (learning rate), epochs (iterations)
  • FCM: m (fuzziness), max_iter, epsilon (convergence threshold)
  • Kernel Density: bandwidth (kernel width)
  • Information Theoretic: alpha (compression-accuracy trade-off)
  • Gaussian Mixture: max_iter, epsilon (convergence threshold)
  • Hierarchical Density: min_cluster_size (minimum points per cluster)
  • Jenks Natural Breaks: Only requires number of clusters
  • Quantile-based: Only requires number of clusters
  • DBSCAN: n (target number of clusters), eps (optional neighborhood size), min_samples (minimum points in cluster), max_iter (maximum iterations for eps adjustment)

Contributing

Feel free to contribute to this project by submitting issues or pull requests.

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

usmerge-0.2.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

usmerge-0.2.1-py3-none-any.whl (10.6 kB view details)

Uploaded Python 3

File details

Details for the file usmerge-0.2.1.tar.gz.

File metadata

  • Download URL: usmerge-0.2.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for usmerge-0.2.1.tar.gz
Algorithm Hash digest
SHA256 893fc3d08a7f75ed682027829c5741b27e049cda2857f465d078377b8a3c3c1d
MD5 4d901a349965f4afcc0ecd217b1d309a
BLAKE2b-256 40ce7351a633651c31ba33339dc31cb0bb26cd66536ea3c72a9de8183d541b7a

See more details on using hashes here.

File details

Details for the file usmerge-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: usmerge-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 10.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.6

File hashes

Hashes for usmerge-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cb1856c69b610be0767758f85a2d6ff25ff0a2c125b7815d5d7b85f01ca96539
MD5 12f80002bcf8f6fcddc2adc5e897a5a3
BLAKE2b-256 f588d8c58ebdf577e97c2372253b7b6ce41a486d158b22ca27e3a7230c9eab0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page