Spectral Clustering

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Spectral Clustering

Python application

Overview

This is a Python re-implementation of the spectral clustering algorithm in the paper Speaker Diarization with LSTM.

refinement

Disclaimer

This is not the original implementation used by the paper.

Specifically, in this implementation, we use the K-Means from scikit-learn, which does NOT support customized distance measure like cosine distance.

Dependencies

numpy
scipy
scikit-learn

Installation

Install the package by:

pip3 install spectralcluster

python3 -m pip install spectralcluster

Tutorial

Simply use the predict() method of class SpectralClusterer to perform spectral clustering:

from spectralcluster import SpectralClusterer

clusterer = SpectralClusterer(
    min_clusters=2,
    max_clusters=100,
    p_percentile=0.95,
    gaussian_blur_sigma=1)

labels = clusterer.predict(X)

The input X is a numpy array of shape (n_samples, n_features), and the returned labels is a numpy array of shape (n_samples,).

For the complete list of parameters of the clusterer, see spectralcluster/spectral_clusterer.py.

Citations

Our paper is cited as:

@inproceedings{wang2018speaker,
  title={Speaker diarization with lstm},
  author={Wang, Quan and Downey, Carlton and Wan, Li and Mansfield, Philip Andrew and Moreno, Ignacio Lopz},
  booktitle={2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  pages={5239--5243},
  year={2018},
  organization={IEEE}
}

FAQs

Laplacian matrix

Question: Why are you performing eigen-decomposition directly on the similarity matrix instead of its Laplacian matrix? (source)

Answer: No, we are not performing eigen-decomposition directly on the similarity matrix. In the sequence of refinement operations, the first operation is CropDiagonal, which replaces each diagonal element of the similarity matrix by the max non-diagonal value of the row. After this operation, the matrix has similar properties to a standard Laplacian matrix.

Question: Why don't you just use the standard Laplacian matrix?

Answer: Our Laplacian matrix is less sensitive (thus more robust) to the Gaussian blur operation.

Cosine vs. Euclidean distance

Question: Your paper says the K-Means should be based on Cosine distance, but this repository is using Euclidean distance. Do you have a Cosine distance version?

Answer: You can find a variant of this repository using Cosine distance for K-means instead of Euclidean distance here: FlorianKrey/DNC

Misc

Our new speaker diarization systems are now fully supervised, powered by uis-rnn. Check this Google AI Blog.

To learn more about speaker diarization, here is a curated list of resources: awesome-diarization.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.21

Aug 24, 2023

0.2.20

Aug 2, 2023

0.2.19

Jul 9, 2023

0.2.18

May 23, 2023

0.2.17

May 18, 2023

0.2.16

Oct 12, 2022

0.2.15

Jul 14, 2022

0.2.14

Jul 5, 2022

0.2.13

Jul 5, 2022

0.2.12

Jun 15, 2022

0.2.11

Jun 14, 2022

0.2.10

Jun 12, 2022

0.2.9

Jun 12, 2022

0.2.8

Jun 10, 2022

0.2.7

Jun 10, 2022

0.2.6

Jun 10, 2022

0.2.5

May 31, 2022

0.2.4

Sep 27, 2021

0.2.3

Sep 22, 2021

0.2.2

Aug 19, 2021

0.2.1

Aug 18, 2021

0.2.0

Aug 16, 2021

This version

0.1.0

Jun 29, 2020

0.0.9

Dec 16, 2019

0.0.8

Jun 10, 2019

0.0.7

Jan 27, 2019

0.0.6

Jan 20, 2019

0.0.5

Jan 19, 2019

0.0.4

Jan 19, 2019

0.0.3

Jan 19, 2019

0.0.2

Jan 19, 2019

0.0.1

Jan 18, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spectralcluster-0.1.0.tar.gz (6.0 kB view hashes)

Uploaded Jun 29, 2020 Source

Built Distribution

spectralcluster-0.1.0-py3-none-any.whl (11.5 kB view hashes)

Uploaded Jun 29, 2020 Python 3

Hashes for spectralcluster-0.1.0.tar.gz

Hashes for spectralcluster-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c46da657b24fe96531c94a525c228bb2c94990c54d31a1529c7a5b1ea13f4d3e`
MD5	`2df501b8f37c5e5bc1e527bfc7b88b95`
BLAKE2b-256	`33b374a452adc62314aec62536bca8e5a0604c55954887fc5e151417b30cd9d1`

Hashes for spectralcluster-0.1.0-py3-none-any.whl

Hashes for spectralcluster-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`263cce6d833b4958a07c9d2fcdbf44045c3ac5db25244f9d4dbe96bf367be1c9`
MD5	`43701b5dc1281487a6efc9381531515d`
BLAKE2b-256	`24a712c567ed32165ccaa0a40bd55384958751a2049891c109fb9a1d3996e7a0`