Skip to main content

Center-based hierarchical clustering framework using 3 steps: compute a Similarity, build an Hierarchy, and lastly extract a clustering using a Partitioning method.

Project description

Similarity-Hierarchical-Partitioning (SHiP) Clustering Framework

PyPI version Tests Docs

This repository is the official implementation of the Similarity-Hierarchical-Partitioning (SHiP) clustering framework proposed in Ultrametric Cluster Hierarchies: I Want `em All! This framework provides a comprehensive approach to clustering by leveraging similarity trees, $(k,z)$-hierarchies, and various partitioning objective functions.

The whole project is implemented in C++ and Python bindings enable the usage within Python.

Overview

The SHiP framework operates in three main stages: SHiP framework overview

  1. Similarity Tree Construction: A similarity tree is built for the given dataset. This tree represents the relationships and proximities between data points. Note that the default constructed tree corresponds to the $k$-center hierarchy (Section 3 in the paper).
  2. $(k,z)$-Hierarchy Construction: Using the similarity tree, a $(k,z)$-hierarchy can be constructed. These hierarchies correlate to common center based clustering methods, as e.g., $k$-median or $k$-means (Section 4).
  3. Partitioning: Finally, the data is partitioned based on the constructed hierarchy and a user-selected partitioning objective function (Section 5).

Features

  • Similarity Trees: The package provides a set of similarity/ultrametric tree implementations:

    • DCTree [1]
    • HST [2]
    • CoverTree [3]
    • KDTree [3]
    • MeanSplitKDTree [3]
    • BallTree [3]
    • MeanSplitBallTree [3]
    • RPTree [3]
    • MaxRPTree [3]
    • UBTree [3]
    • RTree [3]
    • RStarTree [3]
    • XTree [3]
    • HilbertRTree [3]
    • RPlusTree [3]
    • RPlusPlusTree [3]
    • Or use LoadTree to load a precomputed tree
  • $(k,z)$-Hierarchies: It supports all possible $(k,z)$-hierarchies, allowing flexibility in choosing the most suitable hierarchy for a given dataset.

    • $z = 0$ → $k$-center (actually in theory: $z = ∞$, but in this implementation we use 0 for $∞$)
    • $z = 1$ → $k$-median
    • $z = 2$ → $k$-means
    • ...
  • Partitioning Functions: A wide range of partitioning functions are available, enabling users to select the most appropriate function based on their specific needs:

    • K
    • Elbow
    • Threshold
    • ThresholdElbow
    • QCoverage
    • QCoverageElbow
    • QStem
    • QStemElbow
    • LcaNoiseElbow
    • LcaNoiseElbowNoTriangle
    • MedianOfElbows
    • MeanOfElbows
    • Stability
    • NormalizedStability
  • Customization: Users can customize the framework by selecting from the available similarity trees, $(k,z)$- hierarchies, and partitioning functions.

    • E.g., DCTree with $k$-means ($z=2$)-hierarchy and the Elbow partitioning method.

      from SHiP import SHiP
      
      # Build the `DCTree`
      ship = SHiP(data=data_points, treeType="DCTree")
      # Extract the clustering from the $k$-median hierarchy and the `Elbow` partitioning method
      labels = ship.fit_predict(hierarchy=2, partitioningMethod="Elbow")
      
    • Or to get clustering with exact $k$ clusters:

      from SHiP import SHiP
      
      ship = SHiP(data=data_points, treeType="DCTree")
      # Extract the clustering from the $k$-median hierarchy and get `k` clusters
      ship.k = k # Set k here
      labels = ship.fit_predict(hierarchy=2, partitioningMethod="K")
      

Installation

Stable Version

The current stable version can be installed by the following command:
pip install SHiP-framework

Note that a gcc compiler is required for installation. Therefore, in case of an installation error, make sure that:

  • Windows: Microsoft C++ Build Tools is installed
  • Linux/Mac: Python dev is installed (e.g., by running apt-get install python-dev - the exact command may differ depending on the linux distribution)

The error messages may look like this:

error: command 'gcc' failed: No such file or directory
Could not build wheels for SHiP-framework, which is required to install pyproject.toml-based projects
Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools

Development Version

The current development version can be installed directly from git by executing:
sudo pip install git+https://github.com/pasiweber/SHiP-framework.git

Alternatively, clone the repository, go to the root directory and execute:
pip install .

Code Example

from SHiP import SHiP

ship = SHiP(data=data, treeType="DCTree")

# or to load a saved tree
ship = SHiP(data=data, treeType="LoadTree", config={"json_tree_filepath": "<file_path>"}) 
# or additionally specify the tree_type of the loaded tree by adding {"tree_type": "DCTree"}

ship.hierarchy = 0
ship.partitioningMethod = "K"
labels = ship.fit_predict()

# or in one line
labels = ship.fit_predict(hierarchy = 1, partitioningMethod = "Elbow")

# optional: save the current computed tree
json = ship.get_tree().to_json()

Results

Our framework achieves the following performance:

Dataset DC-0-Stab. DC-1-MoE DC-2-Elb. CT-0-Stab. CT-1-MoE CT-2-Elb. $k$-means SCAR Ward AMD-DBSCAN DPC
Boxes 90.1 99.3 97.9 2.6 42.1 ± 4.7 24.2 ± 1.6 93.5 ± 4.3 0.1 ± 0.1 95.8 63.9 25.9
D31 79.7 42.7 82.9 46.5 ± 1.8 62.0 ± 5.4 67.7 ± 3.2 92.0 ± 2.7 41.7 ± 5.4 92.0 86.4 18.5
airway 38.0 65.9 58.8 0.8 18.2 ± 2.4 12.0 ± 1.4 39.9 ± 2.0 -0.9 ± 0.5 43.7 31.7 65.1
lactate 41.0 41.0 67.5 0.1 4.1 ± 0.6 1.7 ± 0.2 28.6 ± 1.1 1.5 ± 1.0 27.7 71.5 0.0
HAR 30.0 46.9 52.8 14.7 ± 8.8 14.2 ± 4.7 9.6 ± 2.2 46.0 ± 4.5 5.5 ± 3.2 49.1 0.0 33.2
letterrec. 12.1 16.6 17.9 5.8 ± 0.2 7.2 ± 0.6 6.2 ± 0.3 12.9 ± 0.6 0.4 ± 0.1 14.7 ± 0.9 7.9 0.0
PenDigits 66.4 73.1 75.4 8.0 ± 0.8 12.0 ± 0.6 8.9 ± 0.5 55.3 ± 3.2 0.9 ± 0.3 55.2 55.6 28.8 ± 1.1
COIL20 81.2 72.8 72.6 46.4 ± 4.4 46.6 ± 2.1 47.7 ± 2.0 58.2 ± 2.8 33.5 ± 2.0 68.6 39.2 35.9 ± 0.1
COIL100 80.1 66.8 70.0 44.6 ± 4.2 46.6 ± 1.5 50.1 ± 1.2 56.1 ± 1.4 16.7 ± 0.8 61.4 14.2 0.2
cmu_faces 60.2 56.6 66.5 8.6 ± 3.1 37.1 ± 4.1 34.2 ± 2.1 53.2 ± 4.7 38.5 ± 2.9 61.6 0.7 0.6
OptDigits 55.3 77.0 77.0 40.9 ± 3.5 20.9 ± 2.3 18.1 ± 2.4 61.3 ± 6.6 14.4 ± 4.1 74.6 ± 2.4 63.2 0.0
USPS 33.7 29.3 29.3 12.0 ± 1.7 8.7 ± 1.0 11.2 ± 1.5 52.3 ± 1.7 2.9 ± 0.9 63.9 0.0 21.0
MNIST 19.7 41.7 46.0 11.1 ± 1.7 5.4 ± 0.6 5.4 ± 0.6 36.9 ± 1.0 1.3 ± 0.4 52.7 0.0 -

License

The project is licensed under the BSD 3-Clause License (see LICENSE.txt).

References

[1] Connecting the Dots -- Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering
[2] HST+: An Efficient Index for Embedding Arbitrary Metric Spaces (Github)
[3] mlpack 4: a fast, header-only C++ machine learning library (Github)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ship_framework-0.1.3.tar.gz (23.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ship_framework-0.1.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (985.0 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (979.5 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (979.0 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (979.2 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (976.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (976.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (976.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file ship_framework-0.1.3.tar.gz.

File metadata

  • Download URL: ship_framework-0.1.3.tar.gz
  • Upload date:
  • Size: 23.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ship_framework-0.1.3.tar.gz
Algorithm Hash digest
SHA256 0876e15ba78ae5809d8c199d3dd3a51000ec06abbafcfb24a618794407ef7dab
MD5 be3d908c4964612e863c0342d3b3dc3f
BLAKE2b-256 4af6ca1656ada58cda7bc8322b60b4b602c4a50222cbc8575651c283cc224209

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3.tar.gz:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 79a84cb9334f6089ea70a072801cb0fb11468cfc7fdecb22915e52a00512f44d
MD5 07b17fbae479e4538f3f9bed50949264
BLAKE2b-256 43556a48499f07176cafa77ba8747c6936379cddbfd7cd3e1be75c7cf5998fbb

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 77ba7f3df88a06496bb10ec2cd7ed3e49c41ed1956e990129b89b6de9fa03957
MD5 a2bad4a9b07a0fb51cb2d4d758b9010b
BLAKE2b-256 cc82796b1ac39057a6ba7e239760121a350ee637b10cf4f687239708a7d39eb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 0b678e29b6076c695205a9ab30773b163139f05fa4927e030b4ce016898ed022
MD5 4af9c7a72629e60545adfdceccbc6d88
BLAKE2b-256 7f842750933b0e6570c3a7658d2ea7055fac3e3f83f5e0147f971616b1e7505b

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 94e0f00fbd9b5cf77192d86e3da2abdd6be23e97fe4ac23c2f8e11e18c36d59f
MD5 5a111bc9c045cac1703b483d07683065
BLAKE2b-256 ba5b8a9960a803d3ea37c460c9b40e764588f5f85f799cc762036978a69ff723

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 56bab94eaf2a6396d197c8f8cc73e60ff7adaa7a83c36ef91219911b4d59c581
MD5 7866b31c17fb620d98242ad1b42d57fc
BLAKE2b-256 bed04726eb3ab4d302534cabdf4068c458d56892a76c11916b516f89bfad4202

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bc6ab5f6219fa2efe61838893342de23b0e3eca22a71fc6d6a6ddf1268bce2e7
MD5 95967f97c8c0a790416f3c5bb69bfdce
BLAKE2b-256 9fa93b7b92530ba1dcfa423ceb25b5bd9f810abbf0362f172c404d0509b42fbd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 76ccf4ac28eb1de76593bfbeb1a051274dd57c076c6a9455ecf24c40790aa3f5
MD5 0ba0c0bba20f9718f7a25a9e99233c15
BLAKE2b-256 93aab048022e03c7b87bda403d6799ace50f0460bd7e906255c176400b72ad43

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.3-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page