Skip to main content

A fast multi-core implementation of the PLSCAN clustering algorithm.

Project description

PyPi version Conda version DOI

Persistent Leaves Spatial Clustering for Applications with Noise

This library provides a new clustering algorithm based on HDBSCAN*. The primary advantages of PLSCAN over the hdbscan and fast_hdbscan libraries are:

  • PLSCAN automatically finds the optimal minimum cluster size.
  • PLSCAN can easily use all available cores to speed up computation.
  • PLSCAN has much faster implementations of tree condensing and cluster extraction.
  • PLSCAN does not rely on JIT compilation.

To use PLSCAN, you only need to set the min_samples parameter. This parameter controls how many neighbors are considered when measuring distances between points. Setting a higher value for min_samples makes the algorithm group points into larger, smoother clusters, and usually results in fewer, more stable clusters.

import numpy as np
import matplotlib.pyplot as plt

from fast_plscan import PLSCAN

data = np.load("docs/data/data.npy")

clusterer = PLSCAN(
  min_samples = 5, # same as in HDBSCAN
).fit(data)

plt.figure()
plt.scatter(
  *data.T, c=clusterer.labels_ % 10, s=5, alpha=0.5, 
  edgecolor="none", cmap="tab10", vmin=0, vmax=9
)
plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()

scatterplot

The algorithm creates a hierarchy of leaf-clusters by changing the minimum cluster size. As this parameter varies, clusters appear or disappear. For each minimum cluster size, the algorithm measures how long these leaf-clusters persist. It then selects the minimum cluster size where the total persistence is highest, giving the most stable clustering. You can visualize this hierarchy using the leaf_tree_ attribute, which provides an alternative to HDBSCAN*'s condensed cluster tree.

clusterer.leaf_tree_.plot(leaf_separation=0.1)
plt.show()

leaf tree

You can also explore how the clustering changes for other important values of the minimum cluster size. The cluster_layers method automatically finds the most persistent clusterings and returns their cluster labels and membership strengths.

layers = clusterer.cluster_layers(max_peaks=4)
for i, (size, labels, probs) in enumerate(layers):
  plt.subplot(2, 2, i + 1)
  plt.scatter(
    *data.T,
    c=labels % 10,
    alpha=np.maximum(0.1, probs),
    s=1,
    linewidth=0,
    cmap="tab10",
  )
  plt.title(f"min_cluster_size={int(size)}")
  plt.axis("off")
plt.subplots_adjust(left=0, right=1, top=1, bottom=0)
plt.show()

layers

Local development

The development workflow works best by pre-installing python dependencies with pip (or alternatives):

pip install numpy scipy matplotlib scikit-learn scikit-build-core nanobind setuptools_scm

Building the package requires cmake and a C++ 23 compiler with OpenMP support. The OpenMP version must support user-defined reductions. Selecting the proper OpenMP version requires some additional configuration, see below. Assuming the compiler and OpenMP are present, the package can be compiled and installed with:

pip install --no-deps --no-build-isolation -ve .

To change the build type, add -C cmake.build-type=Debug or -C cmake.build-type=Release to the command.

scikit-build-core also experimentally supports editable installs (see their documentation):

pip install --no-deps --no-build-isolation -C editable.rebuild=true -ve .

Linux

It may be necessary to tell cmake which compiler it should use. For example, using g++-14 when that is not the system default can be done by adding a -C cmake.args="-DCMAKE_CXX_COMPILER=g++-14" option. The -C cmake.args=... option does not have to be repeated on rebuilds.

MacOS

MacOS requires installing OpenMP using homebrew:

brew install libomp

Also update the ~/.zshrc config file with:

export OpenMP_ROOT=$(brew --prefix)/opt/libomp

or pass OpenMP_ROOT as cmake argument:

pip install --no-deps --no-build-isolation \
  -C cmake.args="-DOpenMP_ROOT=$(brew --prefix)/opt/libomp" \
  -ve .

Windows

The default MSVC C++ compiler on windows does not support a recent enough OpenMP. In addition, the default powershell terminal on windows is not configured for cmake to find the correct OpenMP version. Instead, use a developer powershell configured for a 64-bit target architecture. To open such a terminal, run the following code in a normal Powershell terminal:

$vswhere = "${env:ProgramFiles(x86)}/Microsoft Visual Studio/Installer/vswhere.exe"
$iloc = & $vswhere -products * -latest -property installationpath
$devddl = "$iloc/Common7/Tools/Microsoft.VisualStudio.DevShell.dll"
Import-Module $devddl; Enter-VsDevShell -Arch amd64 -VsInstallPath $iloc -SkipAutomaticLocation

In addition, select the MSVC Clang compiler using -C cmake.args="-T ClangCL" the first time the package is installed:

pip install --no-deps --no-build-isolation -C cmake.args="-T ClangCL" -ve . 

The -C cmake.args=... option does not have to be repeated on rebuilds.

You may need to install the visual studio build tools with the optional Clang compiler support enabled.

Citing

When using this work, please cite our (upcoming) preprint:

@article{bot2025plscan,
  title         = {Persistent Multiscale Density-based Clustering},
  author        = {Dani{\"{e}}l M. Bot and Leland McInnes and Jan Aerts},
  year          = {2025},
  month         = {12},
  archiveprefix = {arXiv},
  eprint        = {TODO},
  primaryclass  = {cs.CL}
}

Licensing

The fast-plscan package has a 3-Clause BSD license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fast_plscan-0.1.0.post1.tar.gz (275.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fast_plscan-0.1.0.post1-cp312-abi3-win_amd64.whl (645.5 kB view details)

Uploaded CPython 3.12+Windows x86-64

fast_plscan-0.1.0.post1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (320.3 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0.post1-cp312-abi3-macosx_15_0_arm64.whl (408.7 kB view details)

Uploaded CPython 3.12+macOS 15.0+ ARM64

fast_plscan-0.1.0.post1-cp311-cp311-win_amd64.whl (647.6 kB view details)

Uploaded CPython 3.11Windows x86-64

fast_plscan-0.1.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (323.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0.post1-cp311-cp311-macosx_15_0_arm64.whl (408.9 kB view details)

Uploaded CPython 3.11macOS 15.0+ ARM64

fast_plscan-0.1.0.post1-cp310-cp310-win_amd64.whl (647.9 kB view details)

Uploaded CPython 3.10Windows x86-64

fast_plscan-0.1.0.post1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl (324.0 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.27+ x86-64manylinux: glibc 2.28+ x86-64

fast_plscan-0.1.0.post1-cp310-cp310-macosx_15_0_arm64.whl (409.1 kB view details)

Uploaded CPython 3.10macOS 15.0+ ARM64

File details

Details for the file fast_plscan-0.1.0.post1.tar.gz.

File metadata

  • Download URL: fast_plscan-0.1.0.post1.tar.gz
  • Upload date:
  • Size: 275.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fast_plscan-0.1.0.post1.tar.gz
Algorithm Hash digest
SHA256 0d4cd1d58db4e4f46d22cab2abcb283dda1a52fe96dfe52c4af1e849d367370f
MD5 1b2e2cf7c089b3b7438f9f81c4707f71
BLAKE2b-256 7f628df12385bb40f82d51f84e6675cf7211e4c5dbd2ee58756bb04784dffc8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1.tar.gz:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp312-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp312-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 e6f3d25cc988f647fa5941c01a041c7056039f68b33c714453e6110fee81b19b
MD5 5febf5b36543d871d5a56f876596b275
BLAKE2b-256 e88ab5190c91d674093a7ae4a6a56197334edc6ac630c351d0a6f1ef97f35a75

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp312-abi3-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 582c5fb70ea6a0eb078b2870945f646f003eff31a8980da3a4fa7d791e8c32f5
MD5 505f37efc80667d241d533a0ebb521d6
BLAKE2b-256 f9d3e9730e0d5df7334801a23508b4a7d125ae9070cbd1be4722618e79f0eb31

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp312-abi3-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp312-abi3-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp312-abi3-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 c5560e31151e7b36255406cb2d6c2a85cb26afcef3f9ed65aa0dd37ce13c3099
MD5 69b273c34d18739b85d0c87bd856b1d8
BLAKE2b-256 67f01dd9dbdbc3d32d526a0b2920f9dc1ec331b1a1ac3f73f07483b2c445a127

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp312-abi3-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 7943c0f3cb6a41a7e1456cbab098ad2113cdb758b8fa8dc33e02e2eb2acead82
MD5 eafb52a307f2a44db354d94ca1b97171
BLAKE2b-256 8b85d37aa4b44bd8275bdb3e37b332fd79fdb16573424eed39f9e7d1b68ed220

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp311-cp311-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d5842f6d9cd85c9c2107474e0032bb3583227f10fe10168918dbbb7949d90b1f
MD5 3fa5b855bca86886ebc046e89c931f73
BLAKE2b-256 f811c361d03d8ab1b9652cbff23c074f79663c3e54768a979ad22300f0094ac9

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp311-cp311-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp311-cp311-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp311-cp311-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 e1c6b7e8ea54d995c768e2b58e3cc110980f5e2d41d535548bb4e36bb9e3f3e2
MD5 0585a864ef8dddf9ebcc1ac8e9769fb4
BLAKE2b-256 edf8351e83e6bd8cf3b47aa56fe14aea7eb1e039a239d070f6924d0da91b81ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp311-cp311-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 4c3f1fb844807b859321bda3e9b03f682e855fe06077a2cddc86f712014a32cd
MD5 f11a7a7671af6cb2dbf3e0f3886c7ee3
BLAKE2b-256 dd1dca72e06c238d2d0f33264a2371fdd2deebf63f0e5fcc77627f66f587b95c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp310-cp310-win_amd64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9f298ffa93f1ceb89e2f688a4ac5ce38a14a6d79c54b1b89d69f02a014afb9bd
MD5 7c27ea7049fbfd77ee99fbf22773f98a
BLAKE2b-256 34689236801e192c017688d24d3086c8d2346d868dd2e66404414408a7e23d88

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fast_plscan-0.1.0.post1-cp310-cp310-macosx_15_0_arm64.whl.

File metadata

File hashes

Hashes for fast_plscan-0.1.0.post1-cp310-cp310-macosx_15_0_arm64.whl
Algorithm Hash digest
SHA256 fc922ed536e5a99eb1500a413feea9e4e749f350a8aeb6fa4822f257fe64cb13
MD5 e08941d74847ac892cb19798e00a1e56
BLAKE2b-256 19123f8daecb6b2b6980fcaad1e90b3e4da938962702b82b4c2fc67018d2a674

See more details on using hashes here.

Provenance

The following attestation bundles were made for fast_plscan-0.1.0.post1-cp310-cp310-macosx_15_0_arm64.whl:

Publisher: release.yml on JelmerBot/fast_plscan

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page