Skip to main content

Center-based hierarchical clustering framework using 3 steps: compute a Similarity, build an Hierarchy, and lastly extract a clustering using a Partitioning method.

Project description

Similarity-Hierarchical-Partitioning (SHiP) Clustering Framework

PyPI version Tests Docs

This repository is the official implementation of the Similarity-Hierarchical-Partitioning (SHiP) clustering framework proposed in Ultrametric Cluster Hierarchies: I Want `em All! This framework provides a comprehensive approach to clustering by leveraging similarity trees, $(k,z)$-hierarchies, and various partitioning objective functions.

The whole project is implemented in C++ and Python bindings enable the usage within Python.

Overview

The SHiP framework operates in three main stages: SHiP framework overview

  1. Similarity Tree Construction: A similarity tree is built for the given dataset. This tree represents the relationships and proximities between data points. Note that the default constructed tree corresponds to the $k$-center hierarchy (Section 3 in the paper).
  2. $(k,z)$-Hierarchy Construction: Using the similarity tree, a $(k,z)$-hierarchy can be constructed. These hierarchies correlate to common center based clustering methods, as e.g., $k$-median or $k$-means (Section 4).
  3. Partitioning: Finally, the data is partitioned based on the constructed hierarchy and a user-selected partitioning objective function (Section 5).

Features

  • Similarity Trees: The package provides a set of similarity/ultrametric tree implementations:

    • DCTree [1]
    • HST [2]
    • CoverTree [3]
    • KDTree [3]
    • MeanSplitKDTree [3]
    • BallTree [3]
    • MeanSplitBallTree [3]
    • RPTree [3]
    • MaxRPTree [3]
    • UBTree [3]
    • RTree [3]
    • RStarTree [3]
    • XTree [3]
    • HilbertRTree [3]
    • RPlusTree [3]
    • RPlusPlusTree [3]
    • Or use LoadTree to load a precomputed tree
  • $(k,z)$-Hierarchies: It supports all possible $(k,z)$-hierarchies, allowing flexibility in choosing the most suitable hierarchy for a given dataset.

    • $z = 0$ → $k$-center (actually in theory: $z = ∞$, but in this implementation we use 0 for $∞$)
    • $z = 1$ → $k$-median
    • $z = 2$ → $k$-means
    • ...
  • Partitioning Functions: A wide range of partitioning functions are available, enabling users to select the most appropriate function based on their specific needs:

    • K
    • Elbow
    • Threshold
    • ThresholdElbow
    • QCoverage
    • QCoverageElbow
    • QStem
    • QStemElbow
    • LcaNoiseElbow
    • LcaNoiseElbowNoTriangle
    • MedianOfElbows
    • MeanOfElbows
    • Stability
    • NormalizedStability
  • Customization: Users can customize the framework by selecting from the available similarity trees, $(k,z)$- hierarchies, and partitioning functions.

    • E.g., DCTree with $k$-means ($z=2$)-hierarchy and the Elbow partitioning method.
      from SHiP import SHiP
      
      # Build the `DCTree`
      ship = SHiP(data=data_points, treeType="DCTree")
      # Extract the clustering from the $k$-median hierarchy and the `Elbow` partitioning method
      labels = ship.fit_predict(hierarchy=2, partitioningMethod="Elbow")
      

Installation

Stable Version

The current stable version can be installed by the following command:
pip install SHiP-framework (coming soon)

Note that a gcc compiler is required for installation. Therefore, in case of an installation error, make sure that:

  • Windows: Microsoft C++ Build Tools is installed
  • Linux/Mac: Python dev is installed (e.g., by running apt-get install python-dev - the exact command may differ depending on the linux distribution)

The error messages may look like this:

error: command 'gcc' failed: No such file or directory
Could not build wheels for SHiP-framework, which is required to install pyproject.toml-based projects
Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools

Development Version

The current development version can be installed directly from git by executing:
sudo pip install git+https://github.com/pasiweber/SHiP-framework.git

Alternatively, clone the repository, go to the root directory and execute:
pip install .

Code Example

from SHiP import SHiP

ship = SHiP(data=data, treeType="DCTree")

# or to load a saved tree
ship = SHiP(data=data, treeType="LoadTree", config={"json_tree_filepath": "<file_path>"}) 
# or additionally specify the tree_type of the loaded tree by adding {"tree_type": "DCTree"}

ship.hierarchy = 0
ship.partitioningMethod = "K"
labels = ship.fit_predict()

# or in one line
labels = ship.fit_predict(hierarchy = 1, partitioningMethod = "Elbow")

# optional: save the current computed tree
json = ship.get_tree().to_json()

Results

Our framework achieves the following performance:

Dataset DC-0-Stab. DC-1-MoE DC-2-Elb. CT-0-Stab. CT-1-MoE CT-2-Elb. $k$-means SCAR Ward AMD-DBSCAN DPC
Boxes 90.1 99.3 97.9 2.6 42.1 ± 4.7 24.2 ± 1.6 93.5 ± 4.3 0.1 ± 0.1 95.8 63.9 25.9
D31 79.7 42.7 82.9 46.5 ± 1.8 62.0 ± 5.4 67.7 ± 3.2 92.0 ± 2.7 41.7 ± 5.4 92.0 86.4 18.5
airway 38.0 65.9 58.8 0.8 18.2 ± 2.4 12.0 ± 1.4 39.9 ± 2.0 -0.9 ± 0.5 43.7 31.7 65.1
lactate 41.0 41.0 67.5 0.1 4.1 ± 0.6 1.7 ± 0.2 28.6 ± 1.1 1.5 ± 1.0 27.7 71.5 0.0
HAR 30.0 46.9 52.8 14.7 ± 8.8 14.2 ± 4.7 9.6 ± 2.2 46.0 ± 4.5 5.5 ± 3.2 49.1 0.0 33.2
letterrec. 12.1 16.6 17.9 5.8 ± 0.2 7.2 ± 0.6 6.2 ± 0.3 12.9 ± 0.6 0.4 ± 0.1 14.7 ± 0.9 7.9 0.0
PenDigits 66.4 73.1 75.4 8.0 ± 0.8 12.0 ± 0.6 8.9 ± 0.5 55.3 ± 3.2 0.9 ± 0.3 55.2 55.6 28.8 ± 1.1
COIL20 81.2 72.8 72.6 46.4 ± 4.4 46.6 ± 2.1 47.7 ± 2.0 58.2 ± 2.8 33.5 ± 2.0 68.6 39.2 35.9 ± 0.1
COIL100 80.1 66.8 70.0 44.6 ± 4.2 46.6 ± 1.5 50.1 ± 1.2 56.1 ± 1.4 16.7 ± 0.8 61.4 14.2 0.2
cmu_faces 60.2 56.6 66.5 8.6 ± 3.1 37.1 ± 4.1 34.2 ± 2.1 53.2 ± 4.7 38.5 ± 2.9 61.6 0.7 0.6
OptDigits 55.3 77.0 77.0 40.9 ± 3.5 20.9 ± 2.3 18.1 ± 2.4 61.3 ± 6.6 14.4 ± 4.1 74.6 ± 2.4 63.2 0.0
USPS 33.7 29.3 29.3 12.0 ± 1.7 8.7 ± 1.0 11.2 ± 1.5 52.3 ± 1.7 2.9 ± 0.9 63.9 0.0 21.0
MNIST 19.7 41.7 46.0 11.1 ± 1.7 5.4 ± 0.6 5.4 ± 0.6 36.9 ± 1.0 1.3 ± 0.4 52.7 0.0 -

License

The project is licensed under the BSD 3-Clause License (see LICENSE.txt).

References

[1] Connecting the Dots -- Density-Connectivity Distance unifies DBSCAN, k-Center and Spectral Clustering
[2] HST+: An Efficient Index for Embedding Arbitrary Metric Spaces (Github)
[3] mlpack 4: a fast, header-only C++ machine learning library (Github)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ship_framework-0.1.2.tar.gz (23.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

ship_framework-0.1.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (973.7 kB view details)

Uploaded CPython 3.14tmanylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (968.4 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (967.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (967.5 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (963.2 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (962.8 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

ship_framework-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (963.0 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

File details

Details for the file ship_framework-0.1.2.tar.gz.

File metadata

  • Download URL: ship_framework-0.1.2.tar.gz
  • Upload date:
  • Size: 23.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ship_framework-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3f5d7e870ebb010376a08202b82bfea203d869deda597ba594d0f171b79f80dc
MD5 d10a675c6b40d5711c5c12aa182c5ed0
BLAKE2b-256 c6674d6002a8c7306ccf28f952093752f066ff293e1100d746baf254a28ee5f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2.tar.gz:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 22775cb2c4dcdc8634e62afac774b2a5a6a54c3a8470aa550a5f2f9f61b100bc
MD5 23c84495acaca0b4b66764034e0a0455
BLAKE2b-256 d761c942980710ac8ee72c0aae4460dbf53d63cfe28de380a3fe86a7a77fa266

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp314-cp314t-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 2509d2775316621dccf2d68e5c4ac394b07931f90ef2d43bbdcea15f28993542
MD5 e87d3ae68591c5af171e8bb93296e661
BLAKE2b-256 9b4e3fcff1f6043a0130ea82e2dc6db2f017f694974867f89e0db31e8052b5e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp314-cp314-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 4b62ae8d8ffcc92d06ac21dce68c1ce70401849f2a265d548d6b746175fc757c
MD5 154c497672b430003cb36337bfad9581
BLAKE2b-256 11108aac587f1dfa9cec0adf33c49418d16ffdfcb7c98770df63ea49c4886fd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp313-cp313-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 42ceb2929f87b67791a9758841a988762ddc0d659c2a7c840ce0456c193b05f6
MD5 0969cb13c1982e0838479311084f142a
BLAKE2b-256 b41fa5ce3e3748cec4413f97c1b11b439fc4703ab1bf892fbc81d87514f00aa7

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp312-cp312-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 022d173cabbeffb147cd687dd96129152e56229d636b15e53f3f10ef86f95377
MD5 e3ce7b18c1e3cc1bae363368f7ea3cb7
BLAKE2b-256 edd385bc3d4e87d40e7c48978bdc67351e07f832aa722d4b52b844476a19e775

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cce99da094d1012a9280bcf91c99bd1cfc468bc66833e0182c87dd5f4d887a50
MD5 800aaa324d1c253c9002a3df11c3b48c
BLAKE2b-256 334bc3171736067113abfaaeebf651c4de7374097d8c1b57408651ff5fb0160a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ship_framework-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for ship_framework-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f47c2115e0201485776487a10cb8bb95469b0842216e4f1676d8be8676e4f2c4
MD5 4d9a051f2dbba8f8a847c6e77ae171dd
BLAKE2b-256 d8e65c16be2802e1869d6ce30515df8305bb848f6f92ccc5156ffef55e0ff80a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ship_framework-0.1.2-cp39-cp39-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: publish_to_pypi.yml on pasiweber/SHiP-framework

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page