Skip to main content

Hierarchical clustering using minimax linkage.

Project description

Documentation Status MIT License

Pyprotoclust is an implementatin of representative hierarchical clustering using minimax linkage.

The original algorithm is from Hierarchical Clustering With Prototypes via Minimax Linkage by Jacob Bien and Robert Tibshirani.

Pyprotoclust takes a distance matrix as input. It returns a linkage matrix encoding the hierachical clustering as well as an additional list labelling the prototypes associated with each clustering. This allows a user to integrate with the existing tools in the SciPy hierarchical clustering module.

Installation:

pip install pyprotoclust

Usage:

from pyprotoclust import protoclust
import numpy as np
import scipy as sp
import scipy.cluster.hierarchy
import scipy.spatial.distance

# Generate two-dimensional toy data
n = 60
np.random.seed(4)
params = [{'mean': [-7, 0], 'cov': [[1, 1], [1, 5]]},
          {'mean': [1, -1], 'cov': [[5, 0], [0, 1]]},
          {'mean': [3, 7], 'cov': [[1, 0], [0, 1]]}]
data = np.vstack([np.random.multivariate_normal(p['mean'], p['cov'], n) for p in params])
X = sp.spatial.distance.squareform(sp.spatial.distance.pdist(data))

# Produce a hierarchical clustering using minimax linkage
Z, prototypes = protoclust(X)

# Generate clusters at a set cut_height using scipy's hierarchy module
cut_height = 7
T = sp.cluster.hierarchy.fcluster(Z, cut_height, criterion='distance')
L,M = sp.cluster.hierarchy.leaders(Z, T)

# Get the prototypes associated with the generated clusters
P = data[prototypes[L]]

The previous example produces a linkage matrix Z and prototypes P that can be used to produce dendrograms and other plots of the data.

A dendrogram of the hierarchical clustering example.

A dendrogram of the hierarchical clustering example with a dashed line at the example cut height.

A scatter plot of the  hierarchical clustering example.

A scatter plot of the example with circles centered at prototypes drawn with radii equal to the top-level linkage heights of each cluster.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyprotoclust-0.1.0.tar.gz (112.1 kB view details)

Uploaded Source

Built Distributions

pyprotoclust-0.1.0-cp38-cp38-win_amd64.whl (190.5 kB view details)

Uploaded CPython 3.8 Windows x86-64

pyprotoclust-0.1.0-cp38-cp38-manylinux2014_x86_64.whl (925.1 kB view details)

Uploaded CPython 3.8

pyprotoclust-0.1.0-cp37-cp37m-win_amd64.whl (189.6 kB view details)

Uploaded CPython 3.7m Windows x86-64

pyprotoclust-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl (925.1 kB view details)

Uploaded CPython 3.7m

File details

Details for the file pyprotoclust-0.1.0.tar.gz.

File metadata

  • Download URL: pyprotoclust-0.1.0.tar.gz
  • Upload date:
  • Size: 112.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.5 CPython/3.7.6 Linux/4.15.0-99-generic

File hashes

Hashes for pyprotoclust-0.1.0.tar.gz
Algorithm Hash digest
SHA256 76918486c1cba1242e1aee0af985d687fb61aeb330745152e0a19cc947c00e5a
MD5 7c242e2f0e11f23ebf5a353e4d1897ed
BLAKE2b-256 0cac89a407807a369d8e4a8dd38c12711f00ce6e160f5d5223f326818f54f5ff

See more details on using hashes here.

File details

Details for the file pyprotoclust-0.1.0-cp38-cp38-win_amd64.whl.

File metadata

  • Download URL: pyprotoclust-0.1.0-cp38-cp38-win_amd64.whl
  • Upload date:
  • Size: 190.5 kB
  • Tags: CPython 3.8, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.7.6 Linux/5.4.0-109-generic

File hashes

Hashes for pyprotoclust-0.1.0-cp38-cp38-win_amd64.whl
Algorithm Hash digest
SHA256 6181aa0f3ef352299c9f372ca2f303847f48965dd7487d3c1dcb219351915aad
MD5 5884a2302394e75883734a29ac1f8c76
BLAKE2b-256 bdfd33e7dd20d2f1d9162029ec7494a4bc6732acb73f9fbe7d322133a515808f

See more details on using hashes here.

File details

Details for the file pyprotoclust-0.1.0-cp38-cp38-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprotoclust-0.1.0-cp38-cp38-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3143f356f0d65a3197b8e8e928809d9ab73aef867435ffaed4820d65dbbc35fc
MD5 8fe36a1df2ff33794c646c7e679cc99a
BLAKE2b-256 73864299a992b4ef0df9fa39a7bd4d90969dff6290090e791474519f42bf3621

See more details on using hashes here.

File details

Details for the file pyprotoclust-0.1.0-cp37-cp37m-win_amd64.whl.

File metadata

  • Download URL: pyprotoclust-0.1.0-cp37-cp37m-win_amd64.whl
  • Upload date:
  • Size: 189.6 kB
  • Tags: CPython 3.7m, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.7.6 Linux/5.4.0-109-generic

File hashes

Hashes for pyprotoclust-0.1.0-cp37-cp37m-win_amd64.whl
Algorithm Hash digest
SHA256 964e40ba8b5efc188732bb6b9414ce4c2d40bcfab27cb5e789f94527aaea4bf2
MD5 8ce1f68f7ad78f9b0ab89170b4501753
BLAKE2b-256 4e347be904bf2aec3ff9d4b47d29c2d30d33bef72c871a3bfc865cf39c46b92b

See more details on using hashes here.

File details

Details for the file pyprotoclust-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pyprotoclust-0.1.0-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 426683173cbd8e7332a2680583cb258a07c8e9e6782786bbbead5e242cbf1494
MD5 884f15d875108eb67b694c0cfcfcb159
BLAKE2b-256 3b87f562735830920f319d74b5b5573acf8cf21f9e90dae83e683c3d667a4045

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page