Skip to main content

A fast multi-core implementation of the PLSCAN clustering algorithm.

Project description

PyPi version Conda version Repository DOI

Persistent Leaves Spatial Clustering for Applications with Noise

This library provides a new clustering algorithm based on HDBSCAN*. The primary advantages of PLSCAN over the hdbscan and fast_hdbscan libraries are:

  • PLSCAN automatically finds the optimal minimum cluster size.
  • PLSCAN can easily use all available cores to speed up computation.
  • PLSCAN has much faster implementations of tree condensing and cluster extraction.
  • PLSCAN does not rely on JIT compilation.

To use PLSCAN, you only need to set the min_samples parameter. This parameter controls how many neighbors are considered when measuring distances between points. Setting a higher value for min_samples makes the algorithm group points into larger, smoother clusters, and usually results in fewer, more stable clusters.

import numpy as np
import matplotlib.pyplot as plt

from fast_plscan import PLSCAN

data = np.load("docs/data/data.npy")

clusterer = PLSCAN(
  min_samples = 5, # same as in HDBSCAN
).fit(data)

plt.figure()
plt.scatter(
  *data.T, c=clusterer.labels_ % 10, s=5, alpha=0.5, 
  edgecolor="none", cmap="tab10", vmin=0, vmax=9
)
plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()

scatterplot

The algorithm creates a hierarchy of leaf-clusters by changing the minimum cluster size. As this parameter varies, clusters appear or disappear. For each minimum cluster size, the algorithm measures how long these leaf-clusters persist. It then selects the minimum cluster size where the total persistence is highest, giving the most stable clustering. You can visualize this hierarchy using the leaf_tree_ attribute, which provides an alternative to HDBSCAN*'s condensed cluster tree.

clusterer.leaf_tree_.plot(leaf_separation=0.1)
plt.show()

leaf tree

You can also explore how the clustering changes for other important values of the minimum cluster size. The cluster_layers method automatically finds the most persistent clusterings and returns their cluster labels and membership strengths.

layers = clusterer.cluster_layers(max_peaks=4)
for i, (size, labels, probs) in enumerate(layers):
  plt.subplot(2, 2, i + 1)
  plt.scatter(
    *data.T,
    c=labels % 10,
    alpha=np.maximum(0.1, probs),
    s=1,
    linewidth=0,
    cmap="tab10",
  )
  plt.title(f"min_cluster_size={int(size)}")
  plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()

layers

Local development

The development workflow works best by pre-installing python dependencies with pip (or alternatives):

pip install numpy scipy matplotlib scikit-learn scikit-build-core nanobind setuptools_scm

Building the package requires cmake and a C++ 23 compiler with OpenMP support. The OpenMP version must support user-defined reductions. Selecting the proper OpenMP version requires some additional configuration, see below. Assuming the compiler and OpenMP are present, the package can be compiled and installed with:

pip install --no-deps --no-build-isolation -ve .

To change the build type, add -C cmake.build-type=Debug or -C cmake.build-type=Release to the command.

scikit-build-core also experimentally supports editable installs (see their documentation):

pip install --no-deps --no-build-isolation -C editable.rebuild=true -ve .

Linux

It may be necessary to tell cmake which compiler it should use. For example, using g++-14 when that is not the system default can be done by adding a -C cmake.args="-DCMAKE_CXX_COMPILER=g++-14" option. The -C cmake.args=... option does not have to be repeated on rebuilds.

MacOS

MacOS requires installing OpenMP using homebrew:

brew install libomp

Also update the ~/.zshrc config file with:

export OpenMP_ROOT=$(brew --prefix)/opt/libomp

or pass OpenMP_ROOT as cmake argument:

pip install --no-deps --no-build-isolation \
  -C cmake.args="-DOpenMP_ROOT=$(brew --prefix)/opt/libomp" \
  -ve .

Windows

The default MSVC C++ compiler on windows does not support a recent enough OpenMP. In addition, the default powershell terminal on windows is not configured for cmake to find the correct OpenMP version. Instead, use a developer powershell configured for a 64-bit target architecture. To open such a terminal, run the following code in a normal Powershell terminal:

$vswhere = "${env:ProgramFiles(x86)}/Microsoft Visual Studio/Installer/vswhere.exe"
$iloc = & $vswhere -products * -latest -property installationpath
$devddl = "$iloc/Common7/Tools/Microsoft.VisualStudio.DevShell.dll"
Import-Module $devddl; Enter-VsDevShell -Arch amd64 -VsInstallPath $iloc -SkipAutomaticLocation

In addition, select the MSVC Clang compiler using -C cmake.args="-T ClangCL" the first time the package is installed:

pip install --no-deps --no-build-isolation -C cmake.args="-T ClangCL" -ve . 

The -C cmake.args=... option does not have to be repeated on rebuilds.

You may need to install the visual studio build tools with the optional Clang compiler support enabled.

Citing

When using this work, please cite our (upcoming) preprint:

@article{bot2025plscan,
  title         = {Persistent Multiscale Density-based Clustering},
  author        = {Dani{\"{e}}l M. Bot and Leland McInnes and Jan Aerts},
  year          = {2025},
  month         = {12},
  archiveprefix = {arXiv},
  eprint        = {TODO},
  primaryclass  = {cs.CL}
}

Licensing

The fast-plscan package has a 3-Clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_plscan-0.1.0.tar.gz (276.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_plscan-0.1.0-cp312-abi3-win_amd64.whl (645.4 kB view details)

Uploaded CPython 3.12+Windows x86-64

fast_plscan-0.1.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (320.2 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0-cp312-abi3-macosx_15_0_arm64.whl (408.6 kB view details)

Uploaded CPython 3.12+macOS 15.0+ ARM64

fast_plscan-0.1.0-cp311-cp311-win_amd64.whl (647.5 kB view details)

Uploaded CPython 3.11Windows x86-64

fast_plscan-0.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (323.6 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0-cp311-cp311-macosx_15_0_arm64.whl (408.8 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

fast_plscan-0.1.0-cp310-cp310-win_amd64.whl (647.8 kB view details)

Uploaded CPython 3.10Windows x86-64

fast_plscan-0.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (323.9 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0-cp310-cp310-macosx_15_0_arm64.whl (409.0 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

File details

Details for the file fast_plscan-0.1.0.tar.gz.

File metadata

  • Download URL: fast_plscan-0.1.0.tar.gz
  • Upload date:
  • Size: 276.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for fast_plscan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 853ba0a0378ca1b3a8152cef75a3ba57c571f7b6d4349df76420135cee9d824f
MD5 6ca7de68b284f98619cbe6dc80b68a2e
BLAKE2b-256 242f89a5344c220fc8d12bbcb26314cfff31755c82e57caa4b3fa4ccdadeaa0c

See more details on using hashes here.

File details

Details for the file fast_plscan-0.1.0-cp312-abi3-win_amd64.whl.

File metadata

  • Download URL: fast_plscan-0.1.0-cp312-abi3-win_amd64.whl
  • Upload date:
  • Size: 645.4 kB
  • Tags: CPython 3.12+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_plscan-0.1.0-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 76c80d6fabf36840b3be81262a0492cbd55e0a8f1379efa317e55df09a64c7c8
MD5 ea6c9d055a595665f6ba11c88907fd25
BLAKE2b-256 c845d7196df7a82fdc91d0c8b38f0f47df29ddee8955ee361ec7b440b92e4919

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp312-abi3-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7e0c901d16f2901460712440da1949382cf984cd19a246295007e7e75db72f69
MD5 0e669977b72ab2a5cb0d81fc4c41c735
BLAKE2b-256 8d3101252c711a13169303a8f2af0a25946222d07de48082d5b810678a4513e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp312-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp312-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 885673dc0fa46eeb686af8ba0a071135fda9653677d8c288b48e6ecc964818af
MD5 be3d9d346558b6450b38f2e4e2bcf175
BLAKE2b-256 89dc012719e888362d3b3cd8bd32624aeb5421473d7ea74d5dd0e7874731b43a

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp312-abi3-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: fast_plscan-0.1.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 647.5 kB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_plscan-0.1.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 378f017b2dc75dac60fb84ab0a134f4e5132bbe38b52e083e7f66b3cc39b64da
MD5 c489686544b61685353b04885c025037
BLAKE2b-256 cdabff6971a453bcc8330839ed08df1f5ee27d0e9cfb02c1322b573ecdf7955b

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp311-cp311-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 269f42249fa10229eb1e0c7940a7697f0b69ef29f1156319195e07bf2ef557c7
MD5 ff5ac08ef7d50d9d77af22d3945c9aa1
BLAKE2b-256 7d8dbee58695f958c45cc034bb757f62f029e926e8bf40d3037df80bd01068af

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 df9780491472db543b9e9fef1f2d3bce80bc02ab5d56cb77fbb64008344176c9
MD5 4436359fcd258982cbf10ff388be7a6c
BLAKE2b-256 6dfbc6df1031c7fac2871a09eec8c4d9aad4e9c98b79c6a68112fd7920aee796

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp311-cp311-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: fast_plscan-0.1.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 647.8 kB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_plscan-0.1.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 8ee191b04e9a5ea0892223b57931fed6bfbac6a20b94b5d26519a1e826d528e5
MD5 bd83087480eee9394395889ef0d8acea
BLAKE2b-256 3eea2936ecfe240b1e9ee043f57214adb661563b125b3903084826aeb81cb8bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp310-cp310-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9106f0cd4df7d883bfc01224a76df9aab37abbc1eada856907f0abbbf420713e
MD5 e57f9c7bb75f964b282cc42b09cb5e16
BLAKE2b-256 3d8554c65f9854a65b9f324e99af3b58741e94b4b54ba6780508dafa1b53bb54

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 957b0f8e911638216e40be4a422b06ff39815e7364b80a5d60b5f58f0455ccec
MD5 784e76ca560932f16ed37112608848bd
BLAKE2b-256 c11c151c59597b4e179f41ccf6aa5aa0e96cec40c1c53612afbad162992d88b9

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0-cp310-cp310-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page