Skip to main content

A python implementation of Correlation Clustering

Project description

Correlation Clustering

A python implementation of Correlation Clustering (Bansal et al., 2004). Correlation Clustering is a weighted graph clustering technique minimizing the sum of cluster disagreements, i.e., the sum of negative edge weights within clusters plus the sum of positive edge weights across clusters. It has some nice properties, e.g.:

  • finds number of clusters by itself
  • handles missing edges
  • robust to errors by minimizing a global loss
  • optimizes an intuitive quality criterion
  • our implementation is fast by using multiprocessing

If you use this software for academic research, please cite these papers:

Find further extensive experiments testing and optimizing v1.0.0 of this implementation in:

Simple example

import networkx as nx
from itertools import combinations
from correlation_clustering.correlation import cluster_correlation_search
import numpy as np

# Define true clusters
nodes = ['node1', 'node2', 'node3', 'node4']
node2clusters_true = {'node1':0, 'node2':0, 'node3':1, 'node4':1}
print('clusters_true', node2clusters_true)

# Initialize graph
graph = nx.Graph()

# Generate perfectly clusterable graph
for (u,v) in combinations(nodes, 2):
    if node2clusters_true[u] == node2clusters_true[v]:
        graph.add_edge(u, v, weight=np.random.choice([3,4]))
    else:
        graph.add_edge(u, v, weight=np.random.choice([1,2]))

# Prepare graph for clustering
threshold = 2.5
for (i,j) in graph.edges():
    graph[i][j]['weight'] = graph[i][j]['weight']-threshold # shift edge weights

# Cluster graph
clusters, cluster_stats = cluster_correlation_search(graph)

# Display results
node2cluster_inferred = {node:i for i, cluster in enumerate(clusters) for node in cluster}
node2cluster_inferred = {node:node2cluster_inferred[node] for node in nodes}
print('clusters_inferred', node2cluster_inferred)
print('loss', cluster_stats['loss'])

Installation

To install the package run

pip install correlation-clustering

Please run the test script with

pytest

BibTex

@inproceedings{Schlechtweg2021dwug,
 title = {{DWUG}: A large Resource of Diachronic Word Usage Graphs in Four Languages},
 author = {Schlechtweg, Dominik  and Tahmasebi, Nina  and Hengchen, Simon  and Dubossarsky, Haim  and McGillivray, Barbara},
 booktitle = {Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing},
 publisher = {Association for Computational Linguistics},
 address = {Online and Punta Cana, Dominican Republic},
 pages = {7079--7091},
 url = {https://aclanthology.org/2021.emnlp-main.567},
 year = {2021}
}
@phdthesis{Schlechtweg2023measurement,
  author  = {Schlechtweg, Dominik},
  title   = {Human and Computational Measurement of Lexical Semantic Change},
  school  = {University of Stuttgart},
  address =  {Stuttgart, Germany},
  year    = {2023},
  url = {http://dx.doi.org/10.18419/opus-12833},
  slides = {https://garrafao.github.io/publications/220324-thesis-slides.pdf}
}
@mastersthesis{Tunc2021OptimierungWUGs,
author = {Benjamin Tunc},
year = {2021}, 
title = {{Optimierung von Clustering von Wortverwendungsgraphen}},
type = {Bachelor thesis},
school = {University of Stuttgart},
slides = {https://garrafao.github.io/publications/211201-optimierung-wugs.pdf},
url = {https://elib.uni-stuttgart.de/handle/11682/11923}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

correlation_clustering-2.0.0.tar.gz (9.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

correlation_clustering-2.0.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file correlation_clustering-2.0.0.tar.gz.

File metadata

  • Download URL: correlation_clustering-2.0.0.tar.gz
  • Upload date:
  • Size: 9.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for correlation_clustering-2.0.0.tar.gz
Algorithm Hash digest
SHA256 6edb283ac2e25315db5312429949aa626c6a001c596d0479ab0ba3cdd1f45340
MD5 ae39e85c2039993c1a5cb5a297cdd98b
BLAKE2b-256 91592efd540e4f5cd4fbef314957ed2c654050f7fad0fd4a6728419cee6a3f26

See more details on using hashes here.

File details

Details for the file correlation_clustering-2.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for correlation_clustering-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c71e3e7aee01248836302e5772d6f912abb0a5eee69718f27b9d210b33625ae3
MD5 08ff3459cc07519e31b67c32e879cdba
BLAKE2b-256 4caff5975bebf8898aa5c571f75a04099e7541df88a9d2c6e3350d8c34365fbc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page