A python implementation of Correlation Clustering

These details have not been verified by PyPI

Project links

Project description

Correlation Clustering

A python implementation of Correlation Clustering (Bansal et al., 2004). Correlation Clustering is a weighted graph clustering technique minimizing the sum of cluster disagreements, i.e., the sum of negative edge weights within clusters plus the sum of positive edge weights across clusters. It has some nice properties, e.g.:

finds number of clusters by itself
handles missing edges
robust to errors by minimizing a global loss
optimizes an intuitive quality criterion
our implementation is fast by using multiprocessing

If you use this software for academic research, please cite these papers:

Dominik Schlechtweg, Nina Tahmasebi, Simon Hengchen, Haim Dubossarsky, Barbara McGillivray. 2021. DWUG: A large Resource of Diachronic Word Usage Graphs in Four Languages. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing.
Dominik Schlechtweg. 2023. Human and Computational Measurement of Lexical Semantic Change. PhD thesis. University of Stuttgart.

Find further extensive experiments testing and optimizing v1.0.0 of this implementation in:

Benjamin Tunc. Optimierung von Clustering von Wortverwendungsgraphen. Bachelor thesis. University of Stuttgart. [slides]

Simple example

import networkx as nx
from itertools import combinations
from correlation_clustering.correlation import cluster_correlation_search
import numpy as np

# Define true clusters
nodes = ['node1', 'node2', 'node3', 'node4']
node2clusters_true = {'node1':0, 'node2':0, 'node3':1, 'node4':1}
print('clusters_true', node2clusters_true)

# Initialize graph
graph = nx.Graph()

# Generate perfectly clusterable graph
for (u,v) in combinations(nodes, 2):
    if node2clusters_true[u] == node2clusters_true[v]:
        graph.add_edge(u, v, weight=np.random.choice([3,4]))
    else:
        graph.add_edge(u, v, weight=np.random.choice([1,2]))

# Prepare graph for clustering
threshold = 2.5
for (i,j) in graph.edges():
    graph[i][j]['weight'] = graph[i][j]['weight']-threshold # shift edge weights

# Cluster graph
clusters, cluster_stats = cluster_correlation_search(graph)

# Display results
node2cluster_inferred = {node:i for i, cluster in enumerate(clusters) for node in cluster}
node2cluster_inferred = {node:node2cluster_inferred[node] for node in nodes}
print('clusters_inferred', node2cluster_inferred)
print('loss', cluster_stats['loss'])

Installation

To install the package run

pip install correlation-clustering

Please run the test script with

pytest

BibTex

@inproceedings{Schlechtweg2021dwug,
 title = {{DWUG}: A large Resource of Diachronic Word Usage Graphs in Four Languages},
 author = {Schlechtweg, Dominik  and Tahmasebi, Nina  and Hengchen, Simon  and Dubossarsky, Haim  and McGillivray, Barbara},
 booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
 publisher = {Association for Computational Linguistics},
 address = {Online and Punta Cana, Dominican Republic},
 pages = {7079--7091},
 url = {https://aclanthology.org/2021.emnlp-main.567},
 year = {2021}
}

@phdthesis{Schlechtweg2023measurement,
  author  = {Schlechtweg, Dominik},
  title   = {Human and Computational Measurement of Lexical Semantic Change},
  school  = {University of Stuttgart},
  address =  {Stuttgart, Germany},
  year    = {2023},
  url = {http://dx.doi.org/10.18419/opus-12833},
  slides = {https://garrafao.github.io/publications/220324-thesis-slides.pdf}
}

@mastersthesis{Tunc2021OptimierungWUGs,
author = {Benjamin Tunc},
year = {2021}, 
title = {{Optimierung von Clustering von Wortverwendungsgraphen}},
type = {Bachelor thesis},
school = {University of Stuttgart},
slides = {https://garrafao.github.io/publications/211201-optimierung-wugs.pdf},
url = {https://elib.uni-stuttgart.de/handle/11682/11923}
}

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Apr 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

correlation_clustering-2.0.0.tar.gz (9.2 kB view details)

Uploaded Apr 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

correlation_clustering-2.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Apr 9, 2026 Python 3

File details

Details for the file correlation_clustering-2.0.0.tar.gz.

File metadata

Download URL: correlation_clustering-2.0.0.tar.gz
Upload date: Apr 9, 2026
Size: 9.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for correlation_clustering-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`6edb283ac2e25315db5312429949aa626c6a001c596d0479ab0ba3cdd1f45340`
MD5	`ae39e85c2039993c1a5cb5a297cdd98b`
BLAKE2b-256	`91592efd540e4f5cd4fbef314957ed2c654050f7fad0fd4a6728419cee6a3f26`

See more details on using hashes here.

File details

Details for the file correlation_clustering-2.0.0-py3-none-any.whl.

File metadata

Download URL: correlation_clustering-2.0.0-py3-none-any.whl
Upload date: Apr 9, 2026
Size: 9.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for correlation_clustering-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c71e3e7aee01248836302e5772d6f912abb0a5eee69718f27b9d210b33625ae3`
MD5	`08ff3459cc07519e31b67c32e879cdba`
BLAKE2b-256	`4caff5975bebf8898aa5c571f75a04099e7541df88a9d2c6e3350d8c34365fbc`

See more details on using hashes here.

correlation-clustering 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Correlation Clustering

Simple example

Installation

BibTex

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes