Skip to main content

Center-based hierarchical clustering framework using 3 steps: compute a Similarity, build an Hierarchy, and lastly extract a clustering using a Partitioning method.

Project description

Similarity-Hierarchical-Partitioning (SHiP) Clustering Framework

PyPI version Tests Docs

This repository is the official implementation of the Similarity-Hierarchical-Partitioning (SHiP) clustering framework proposed in Ultrametric Cluster Hierarchies: I Want `em All! This framework provides a comprehensive approach to clustering by leveraging similarity trees, $(k,z)$-hierarchies, and various partitioning objective functions.

The whole project is implemented in C++ and Python bindings enable the usage within Python.

Overview

The SHiP framework operates in three main stages: SHiP framework overview

  1. Similarity Tree Construction: A similarity tree is built for the given dataset. This tree represents the relationships and proximities between data points. Note that the default constructed tree corresponds to the $k$-center hierarchy (Section 3 in the paper).
  2. $(k,z)$-Hierarchy Construction: Using the similarity tree, a $(k,z)$-hierarchy can be constructed. These hierarchies correlate to common center based clustering methods, as e.g., $k$-median or $k$-means (Section 4).
  3. Partitioning: Finally, the data is partitioned based on the constructed hierarchy and a user-selected partitioning objective function (Section 5).

Features

  • Similarity Trees: The package provides a set of similarity/ultrametric tree implementations:

    • DCTree [1]
    • HST [2]
    • CoverTree [3]
    • KDTree [3]
    • MeanSplitKDTree [3]
    • BallTree [3]
    • MeanSplitBallTree [3]
    • RPTree [3]
    • MaxRPTree [3]
    • UBTree [3]
    • RTree [3]
    • RStarTree [3]
    • XTree [3]
    • HilbertRTree [3]
    • RPlusTree [3]
    • RPlusPlusTree [3]
    • Or use LoadTree to load a precomputed tree
  • $(k,z)$-Hierarchies: It supports all possible $(k,z)$-hierarchies, allowing flexibility in choosing the most suitable hierarchy for a given dataset.

    • $z = 0$ → $k$-center (actually in theory: $z = ∞$, but in this implementation we use 0 for $∞$)
    • $z = 1$ → $k$-median
    • $z = 2$ → $k$-means
    • ...
  • Partitioning Functions: A wide range of partitioning functions are available, enabling users to select the most appropriate function based on their specific needs:

    • K
    • Elbow
    • Threshold
    • ThresholdElbow
    • QCoverage
    • QCoverageElbow
    • QStem
    • QStemElbow
    • LcaNoiseElbow
    • LcaNoiseElbowNoTriangle
    • MedianOfElbows
    • MeanOfElbows
    • Stability
    • NormalizedStability
  • Customization: Users can customize the framework by selecting from the available similarity trees, $(k,z)$- hierarchies, and partitioning functions.

    • E.g., DCTree with $k$-means ($z=2$)-hierarchy and the Elbow partitioning method.

      from SHiP import SHiP
      
      # Build the `DCTree`
      ship = SHiP(data=data_points, treeType="DCTree")
      # Extract the clustering from the $k$-median hierarchy and the `Elbow` partitioning method
      labels = ship.fit_predict(hierarchy=2, partitioningMethod="Elbow")
      
    • Or to get clustering with exact $k$ clusters:

      from SHiP import SHiP
      
      ship = SHiP(data=data_points, treeType="DCTree")
      # Extract the clustering from the $k$-median hierarchy and get `k` clusters
      ship.k = k # Set k here
      labels = ship.fit_predict(hierarchy=2, partitioningMethod="K")
      

Installation

Stable Version

The current stable version can be installed by the following command:
pip install SHiP-framework

Note that a gcc compiler is required for installation. Therefore, in case of an installation error, make sure that:

  • Windows: Microsoft C++ Build Tools is installed
  • Linux/Mac: Python dev is installed (e.g., by running apt-get install python-dev - the exact command may differ depending on the linux distribution)

The error messages may look like this:

error: command 'gcc' failed: No such file or directory
Could not build wheels for SHiP-framework, which is required to install pyproject.toml-based projects
Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools

Development Version

The current development version can be installed directly from git by executing:
sudo pip install git+https://github.com/pasiweber/SHiP-framework.git

Alternatively, clone the repository, go to the root directory and execute:
pip install .

Code Example

from SHiP import SHiP

ship = SHiP(data=data, treeType="DCTree")

# or to load a saved tree
ship = SHiP(data=data, treeType="LoadTree", config={"json_tree_filepath": "<file_path>"}) 
# or additionally specify the tree_type of the loaded tree by adding {"tree_type": "DCTree"}

ship.hierarchy = 0
ship.partitioningMethod = "K"
labels = ship.fit_predict()

# or in one line
labels = ship.fit_predict(hierarchy = 1, partitioningMethod = "Elbow")

# optional: save the current computed tree
json = ship.get_tree().to_json()

Results

Our framework achieves the following performance:

Dataset DC-0-Stab. DC-1-MoE DC-2-Elb. CT-0-Stab. CT-1-MoE CT-2-Elb. $k$-means SCAR Ward AMD-DBSCAN DPC
Boxes 90.1 99.3 97.9 2.6 42.1 ± 4.7 24.2 ± 1.6 93.5 ± 4.3 0.1 ± 0.1 95.8 63.9 25.9
D31 79.7 42.7 82.9 46.5 ± 1.8 62.0 ± 5.4 67.7 ± 3.2 92.0 ± 2.7 41.7 ± 5.4 92.0 86.4 18.5
airway 38.0 65.9 58.8 0.8 18.2 ± 2.4 12.0 ± 1.4 39.9 ± 2.0 -0.9 ± 0.5 43.7 31.7 65.1
lactate 41.0 41.0 67.5 0.1 4.1 ± 0.6 1.7 ± 0.2 28.6 ± 1.1 1.5 ± 1.0 27.7 71.5 0.0
HAR 30.0 46.9 52.8 14.7 ± 8.8 14.2 ± 4.7 9.6 ± 2.2 46.0 ± 4.5 5.5 ± 3.2 49.1 0.0 33.2
letterrec. 12.1 16.6 17.9 5.8 ± 0.2 7.2 ± 0.6 6.2 ± 0.3 12.9 ± 0.6 0.4 ± 0.1 14.7 ± 0.9 7.9 0.0
PenDigits 66.4 73.1 75.4 8.0 ± 0.8 12.0 ± 0.6 8.9 ± 0.5 55.3 ± 3.2 0.9 ± 0.3 55.2 55.6 28.8 ± 1.1
COIL20 81.2 72.8 72.6 46.4 ± 4.4 46.6 ± 2.1 47.7 ± 2.0 58.2 ± 2.8 33.5 ± 2.0 68.6 39.2 35.9 ± 0.1
COIL100 80.1 66.8 70.0 44.6 ± 4.2 46.6 ± 1.5 50.1 ± 1.2 56.1 ± 1.4 16.7 ± 0.8 61.4 14.2 0.2
cmu_faces 60.2 56.6 66.5 8.6 ± 3.1 37.1 ± 4.1 34.2 ± 2.1 53.2 ± 4.7 38.5 ± 2.9 61.6 0.7 0.6
OptDigits 55.3 77.0 77.0 40.9 ± 3.5 20.9 ± 2.3 18.1 ± 2.4 61.3 ± 6.6 14.4 ± 4.1 74.6 ± 2.4 63.2 0.0
USPS 33.7 29.3 29.3 12.0 ± 1.7 8.7 ± 1.0 11.2 ± 1.5 52.3 ± 1.7 2.9 ± 0.9 63.9 0.0 21.0
MNIST 19.7 41.7 46.0 11.1 ± 1.7 5.4 ± 0.6 5.4 ± 0.6 36.9 ± 1.0 1.3 ± 0.4 52.7 0.0 -

License

The project is licensed under the BSD 3-Clause License (see LICENSE.txt).

References

[1] Connecting the Dots -- Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering
[2] HST+: An Efficient Index for Embedding Arbitrary Metric Spaces (Github)
[3] mlpack 4: a fast, header-only C++ machine learning library (Github)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ship_framework-0.1.4.tar.gz (23.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ship_framework-0.1.4-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (993.9 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (990.0 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (989.5 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (989.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (987.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (986.3 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (986.4 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file ship_framework-0.1.4.tar.gz.

File metadata

  • Download URL: ship_framework-0.1.4.tar.gz
  • Upload date:
  • Size: 23.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ship_framework-0.1.4.tar.gz
Algorithm Hash digest
SHA256 d859084e035c46d0f35c81273ce50a635dbe697286f246bc072e754e20934251
MD5 ae74b23cf47b37de6fbda19504cf9d11
BLAKE2b-256 9cffd7c23a2dbb2e4811c16bd7a9f69cd26ad8d3ba50e72faedc9b709a412212

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4.tar.gz:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f6dcdee2bf2936b099eff948b2a815e79fa35978379e958f840407124515cf91
MD5 a109b70f057ad3c6b5bdb1bb0332ce2f
BLAKE2b-256 a8de76a4bd2b7c4a4f07bfaad0a137303b6edd7bc393830f2ada5823a0fde84f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 e467ccdf16e88b5e2865075aba6cdc1a8212c346dad2b961f097cee207cfd21f
MD5 d7a72c2c3d5ab03742933249e81bc382
BLAKE2b-256 eaab5dd238d081fe4617df5334772c77fcbd7b08d8322c42d5b077ac4e2f82bc

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 225216ecb272080237808c414f91d90775ce5a12c92dccf21378b01a6474cf1c
MD5 cea55701d9a5a562cce0f0133a538123
BLAKE2b-256 8bd6933438e5eb6a813ea3f3e598975609e6c04da567a221b9ea96e7799a84d7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 076ed03dcb0e02ed86070e08d974732149fb443bd1bfb3a551681ddedccf7777
MD5 c5a271cb6262bada646fa549d3855e6c
BLAKE2b-256 8f622b7772f42307c6f132f913848e9acf5ac33d3f337853581b6266fc6d1986

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5cfff9c40ebf2f302811f6ccbdc6b129f6a9aa8424eba15ddb31aa4073362bb2
MD5 c21b3921044c772c9c78160579c7fddc
BLAKE2b-256 83fbd3b6b63ee8f66c9ac1e92fe5110ee99e765c3025e8f2fe245ce173a32d21

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d052c56f7530f2c05044af163298ad353cc250daba5efe56bcd72af3c9f39ca2
MD5 7eb7e943246a75a314fa6b9ceafc9743
BLAKE2b-256 d3634d21353a5908167123663bd9d09c5a2a5b3505635aa648d19ae2cab87f5d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 94e994d55ff6a8ec31e1a53caea7a82a72900d09814bcacd2b7b7c5b718966ef
MD5 54bc1b9adf3aa5a672bd2a2fb15825a3
BLAKE2b-256 f9c96d194e97ed9ab8c0464ee161e9e8f5c2aaed7bbcdee886141673d514646c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.4-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page