Skip to main content

Universal clustering based on dialectical materialism

Project description

PyPI Version License

DRUHG

DRUHG - Dialectical Reflection Universal Hierarchical Grouping (друг).
Performs clustering based on densities and builds a minimum spanning tree.
Does not require parameters. (The parameter is metric)
The user can filter the size of the clusters with limit1 and limit2.
To get the genuine result and genuine outliers set limit1 to 1 and limit2 to sample size.
Parameter fix_outliers allows to label outliers to their closest clusters via mstree edges.

Basic Concept

There are some optional tuning parameters but the actual algorithm requires none and is universal.
It works by applying the universal society rule: treat others how you want to be treated.
The core of the algorithm is to rank the subject’s closest subjective similarities and amalgamate them accordingly.
Parameter max_ranking controls precision vs productivity balance, after some value the precision and the result would not change.
Parameter algorithm can be set to ‘slow’ to further enhance the precision.
The relationship of two objects sets two local densities, and distorts the distance between them.
That dialectical distance is the reflection - one objects adjusts it’s density to fit it’s counterpart.
This allows to arrange all of the relationships into minimal spanning tree.
Mutual closeness is preferential.
At the start, unconnected objects amalgamate into Universal and these contradictions define what amalgamation is the cluster.
The amalgamation has to reflect in the other to emerge as a cluster. The more sizeable adversary the more probable is the change.
After formation big cluster resists the outliers. This makes it a great algorithm for outlier detection.
Cluster is a mutually-close reflections.
To come up with this universal solution philosophy of dialectical materialism was used.
You can read more about it in this work. In Russian
where you can read on:
- triad Quality-Quantity-Measure (distance-rank-memberships)
- triad Singular-Particular-Universal (subject-cluster-dataset)
- and more

How to use DRUHG

import sklearn.datasets as datasets
import druhg

iris = datasets.load_iris()
XX = iris['data']

clusterer = druhg.DRUHG(max_ranking=50)
labels = clusterer.fit(XX).labels_

It will build the tree and label the points. Now you can manipulate clusters by relabeling.

labels = dr.relabel(limit1=1, limit2=len(XX)/2, fix_outliers=1)
ari = adjusted_rand_score(iris['target'], labels)
print ('iris ari', ari)

It will relabel the clusters, by restricting their size.

from druhg import DRUHG
import matplotlib.pyplot as plt
import pandas as pd, numpy as np

XX = pd.read_csv('chameleon.csv', sep='\t', header=None)
XX = np.array(XX)
clusterer = DRUHG(max_ranking=200)
clusterer.fit(XX)

plt.figure(figsize=(30,16))
clusterer.minimum_spanning_tree_.plot(node_size=200)

It will draw mstree with druhg-edges.

chameleon

Performance

It can be slow on a highly structural data.
There is a parameters max_ranking that can be used to decrease for a better performance.

Installing

PyPI install, presuming you have an up to date pip:

pip install druhg

Running the Tests

The package tests can be run after installation using the command:

pytest -s druhg

or

python -m pytest -s druhg

The tests may fail :-D

Python Version

The druhg library supports both Python 2 and Python 3.

Contributing

We welcome contributions in any form! Assistance with documentation, particularly expanding tutorials, is always welcome. To contribute please fork the project make your changes and submit a pull request. We will do our best to work through any issues with you and get your code merged into the main branch.

Licensing

The druhg package is 3-clause BSD licensed.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

druhg-1.5.0.tar.gz (477.2 kB view details)

Uploaded Source

File details

Details for the file druhg-1.5.0.tar.gz.

File metadata

  • Download URL: druhg-1.5.0.tar.gz
  • Upload date:
  • Size: 477.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.5

File hashes

Hashes for druhg-1.5.0.tar.gz
Algorithm Hash digest
SHA256 7278ee63558ecee53c87c0b93ed335850bb5e6ebe6bee9d176580d758b77503e
MD5 32bb75d4156a5614527267eaf6681ce7
BLAKE2b-256 8cf98c8cd44443fc900bce226fa9d09c0ba77dfaf53bd1d6ebe8dfb676e7c697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page