A simple package to merge one-dimension data by unsupervised method
Project description
Unsupervised Merge
A simple Python package for one-dimensional data clustering, implementing various clustering algorithms including traditional and novel approaches.
Installation
Install the package using pip:
pip install usmerge
Features
This package provides multiple one-dimensional clustering methods:
- Equal Width Binning (equal_wid_merge)
- Equal Frequency Binning (equal_fre_merge)
- K-means Clustering (kmeans_merge)
- SOM-K Clustering (som_k_merge)
- Fuzzy C-Means (fcm_merge)
- Kernel Density Based (kernel_density_merge)
- Information Theoretic (information_merge)
- Gaussian Mixture (gaussian_mixture_merge)
- Hierarchical Density (hierarchical_density_merge)
- Jenks Natural Breaks (jenks_breaks_merge)
- Quantile-based (quantile_merge)
- DBSCAN (dbscan_1d_merge)
Usage
Data Format
The package accepts various input formats:
- pandas Series/DataFrame
- numpy array
- Python list/tuple
- Any iterable of numbers
Basic Usage Examples
- Equal Width Binning:
from usmerge import equal_wid_merge
labels, edges = equal_wid_merge(data, n=3)
- Equal Frequency Binning:
from usmerge import equal_fre_merge
labels, edges = equal_fre_merge(data, n=3)
- K-means Clustering:
from usmerge import kmeans_merge
labels, edges = kmeans_merge(data, n=3, max_iter=100)
Advanced Usage
- SOM-K Clustering:
from usmerge import som_k_merge
labels, edges = som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000)
- Fuzzy C-Means:
from usmerge import fcm_merge
labels, edges = fcm_merge(data, n=3, m=2.0, max_iter=100, epsilon=1e-6)
- Kernel Density Based:
from usmerge import kernel_density_merge
labels, edges = kernel_density_merge(data, n=3, bandwidth=None)
- Jenks Natural Breaks:
from usmerge import jenks_breaks_merge
labels, edges = jenks_breaks_merge(data, n=3)
- Quantile-based Clustering:
from usmerge import quantile_merge
labels, edges = quantile_merge(data, n=3)
- DBSCAN Clustering:
from usmerge import dbscan_1d_merge
labels, edges = dbscan_1d_merge(data, n=3, min_samples=3)
Return Values
All clustering methods return two values:
- labels: List of cluster labels for each data point
- edges: List of cluster boundaries
Example Analysis
import numpy as np
import matplotlib.pyplot as plt
from usmerge import som_k_merge, fcm_merge, kmeans_merge, hierarchical_density_merge, dbscan_1d_merge
# Generate synthetic data with three clear clusters
np.random.seed(42)
data = np.concatenate([
np.random.normal(0, 0.3, 50), # First cluster
np.random.normal(5, 0.4, 50), # Second cluster
np.random.normal(10, 0.3, 50) # Third cluster
])
# Compare different clustering methods
methods = {
'SOM-K': som_k_merge(data, n=3, sigma=0.5, learning_rate=0.5, epochs=1000),
'FCM': fcm_merge(data, n=3, m=2.0, max_iter=100),
'K-means': kmeans_merge(data, n=3),
'DBSCAN': dbscan_1d_merge(data, n=3, min_samples=3),
'Hierarchical Density': hierarchical_density_merge(data, n=3)
}
# Visualize results
plt.figure(figsize=(15, 5))
for i, (name, (labels, edges)) in enumerate(methods.items(), 1):
plt.subplot(1, 5, i)
plt.scatter(data, np.zeros_like(data), c=labels, cmap='viridis')
plt.title(f'{name} Clustering')
# Plot cluster boundaries
for edge in edges:
plt.axvline(x=edge, color='r', linestyle='--', alpha=0.5)
plt.ylim(-0.5, 0.5)
plt.tight_layout()
plt.show()
Parameters Guide
Each clustering method has its own set of parameters:
- SOM-K:
sigma(neighborhood size),learning_rate(learning rate),epochs(iterations) - FCM:
m(fuzziness),max_iter,epsilon(convergence threshold) - Kernel Density:
bandwidth(kernel width) - Information Theoretic:
alpha(compression-accuracy trade-off) - Gaussian Mixture:
max_iter,epsilon(convergence threshold) - Hierarchical Density:
min_cluster_size(minimum points per cluster) - Jenks Natural Breaks: Only requires number of clusters
- Quantile-based: Only requires number of clusters
- DBSCAN:
n(target number of clusters),eps(optional neighborhood size),min_samples(minimum points in cluster),max_iter(maximum iterations for eps adjustment)
Contributing
Feel free to contribute to this project by submitting issues or pull requests.
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file usmerge-0.2.1.tar.gz.
File metadata
- Download URL: usmerge-0.2.1.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
893fc3d08a7f75ed682027829c5741b27e049cda2857f465d078377b8a3c3c1d
|
|
| MD5 |
4d901a349965f4afcc0ecd217b1d309a
|
|
| BLAKE2b-256 |
40ce7351a633651c31ba33339dc31cb0bb26cd66536ea3c72a9de8183d541b7a
|
File details
Details for the file usmerge-0.2.1-py3-none-any.whl.
File metadata
- Download URL: usmerge-0.2.1-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb1856c69b610be0767758f85a2d6ff25ff0a2c125b7815d5d7b85f01ca96539
|
|
| MD5 |
12f80002bcf8f6fcddc2adc5e897a5a3
|
|
| BLAKE2b-256 |
f588d8c58ebdf577e97c2372253b7b6ce41a486d158b22ca27e3a7230c9eab0a
|