Skip to main content

Hierarchical agglomerative clustering with soft constraints (SciPy-compatible Z).

Project description

Tests PyPI version

Constrained Hierarchical Agglomerative Clustering

This repository contains the implementation of the constrained linkage function for Constrained Hierarchical Agglomerative Clustering from the paper:

HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations
Jonne van Dreven, Abbas Cheddad, Ahmad Nauman Ghazi, Sadi Alawadi, Jad Al Koussa, Dirk Vanhoudt
Energy and AI, 21 (2025), 100548
DOI: 10.1016/j.egyai.2025.100548

If you use this library in academic or scientific work, please cite:

@article{van_Dreven-HEAT,
  title={HEAT: Hierarchical-constrained Encoder-Assisted Time series clustering for fault detection in district heating substations},
  volume={21},
  ISSN={2666-5468},
  DOI={10.1016/j.egyai.2025.100548},
  journal={Energy and AI},
  author={van Dreven, Jonne and Cheddad, Abbas and Ghazi, Ahmad Nauman and Alawadi, Sadi and Al Koussa, Jad and Vanhoudt, Dirk},
  year={2025},
  month=sep,
  pages={100548}
}

A NumPy-only hierarchical agglomerative clustering routine with soft constraints, returning a SciPy-compatible linkage matrix Z.

✨ Features

  • Drop-in replacement for a constrained linkage routine supporting:
    • single, complete, average, weighted, centroid, median, ward
  • Accepts either:
    • condensed 1-D distances (len n*(n-1)/2)
    • n×n square distance matrix
  • Adds soft constraints:
    • Must-link / Cannot-link via a constraint matrix M
      • M[i,j] < 0 → encourages merging (must-link)
      • M[i,j] > 0 → discourages merging (cannot-link)
      • When normalize_distances=True, these penalties are scaled relative to the [0, 1] normalized distance range, making them proportional regardless of the original distance scale.
    • Min/max cluster size penalties (linear in violation amount)
      • Similarly scales proportionally when normalize_distances=True
  • No SciPy dependency — output Z works with SciPy’s downstream tools.

🔌 Plug-and-play

constrained_linkage is a drop-in replacement for SciPy’s linkage function.

  • No constraints? Works identically to scipy.cluster.hierarchy.linkage.
  • With constraints? Adds powerful, flexible soft constraints with minimal code changes.
  • Output is a SciPy-compatible linkage matrix Z, so you can keep using all SciPy tools (e.g., fcluster, dendrogram) unchanged.

🔧 Install

pip install constrained-linkage
# from source:
pip install "git+https://github.com/jonnevd/constrained-linkage"

🚀 Usage Examples

Below we illustrate must-link (negative penalties) and cannot-link (positive penalties) via the constraint matrix M.
All distances are optionally scaled to [0,1] when normalize_distances=True, so penalties are scale-free.

Semantics:

  • M[i, j] < 0must-link (encourage merging i↔j)
  • M[i, j] > 0cannot-link (discourage merging i↔j)

Example 1 — Must-link & Cannot-link constraints

import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy
import matplotlib.pyplot as plt

# Four points in 1D (two well-separated pairs)
X = np.array([[0.0], [0.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

# Constraint matrix: must-link 0↔1, cannot-link 2↔3
M = np.zeros_like(D)
M[0, 1] = M[1, 0] = -0.6   # must-link (negative)
M[2, 3] = M[3, 2] =  0.6   # cannot-link (positive)

Z = constrained_linkage(
    D, method="average",
    constraint_matrix=M,
    normalize_distances=True
)

# Works seamlessly with SciPy tools
labels = hierarchy.fcluster(Z, 2, criterion="maxclust")
print("Partition with must-link(0,1) & cannot-link(2,3):", labels)

plt.figure(figsize=(6, 3))
hierarchy.dendrogram(Z, labels=[f"P{i}" for i in range(len(X))])
plt.title("Dendrogram — must-link(0,1), cannot-link(2,3)")
plt.tight_layout()
plt.show()

Example 2 — Enforcing a maximum cluster size

Discourage clusters larger than a threshold by adding a positive penalty above the maximum.

import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy

# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

Z_max = constrained_linkage(
    D, method="average",
    max_cluster_size=2,     # soft cap
    max_penalty_weight=0.6, # stronger => avoids overgrown clusters
    normalize_distances=True
)

labels_max = hierarchy.fcluster(Z_max, 3, criterion="maxclust")
print("Partition with max_cluster_size=2:", labels_max)

Example 3 — Enforcing a minimum cluster size

When domain knowledge suggests small units should coalesce before analysis, use a minimum size prior to avoid singletons or small groups. Increasing the penalty weight strengthens this bias, as shown in the figure below.

Effect of min_cluster_size penalty on small clusters

import numpy as np
from constrained_linkage import constrained_linkage
from scipy.cluster import hierarchy

# Six points in 1D (three tight pairs)
X = np.array([[0.0], [0.1], [5.0], [5.1], [10.0], [10.1]])
D = np.sqrt(((X[:, None, :] - X[None, :, :]) ** 2).sum(-1))

Z_min = constrained_linkage(
    D, method="average",
    min_cluster_size=3,     # target minimum size
    min_penalty_weight=0.5, # stronger => merge undersized clusters earlier
    normalize_distances=True
)

labels_min = hierarchy.fcluster(Z_min, 2, criterion="maxclust")
print("Partition with min_cluster_size=3:", labels_min)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

constrained_linkage-0.3.0.tar.gz (95.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

constrained_linkage-0.3.0-py3-none-any.whl (10.8 kB view details)

Uploaded Python 3

File details

Details for the file constrained_linkage-0.3.0.tar.gz.

File metadata

  • Download URL: constrained_linkage-0.3.0.tar.gz
  • Upload date:
  • Size: 95.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.6

File hashes

Hashes for constrained_linkage-0.3.0.tar.gz
Algorithm Hash digest
SHA256 55fa70f733bf8d339a0c2f8a7988fc3d58d971a6ded3eee1fad3b28f4bb599cf
MD5 547ba456861a659027043868a179b24d
BLAKE2b-256 894f5d70701f4de457e9eea7803b97bea97f008116e696f2c527a8d2c10c8155

See more details on using hashes here.

File details

Details for the file constrained_linkage-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for constrained_linkage-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3de747e4438f5c01e1be50e1bf81e89bf460aacaffd9271f4e59165b22fb5231
MD5 4f38790a10e7cb05f7fefe8e7c926504
BLAKE2b-256 caa7e2e8a756dde02b064efe0d096ee2f4d7bda073d5ca16f4d0a3ac3d9aebf0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page