Skip to main content

an implementation of spectral clustering for text document collections

Project description



Spectral clustering a modern clustering technique considered to be effective for image clustering among others. [1] [2]

This software find clusters among documents based on the bag-of-words representation [3] and TF-IDF weighting [4].

[1]Ulrike von Luxburg, A Tutorial on Spectral Clustering, 2006.
[2]Chris H. Q. Ding, Spectral Clustering, 2004.


Following softwares are required.

  • Python 2 or 3
  • Numpy
  • Scipy

How to use

  1. Prepare documents as raw-text files, and put them in a directory, for example, ‘reuters’.

  2. Prepare a category file. For example, ‘cats.txt’ may contain:

    14833 palm-oil veg-oil
    14839 ship

    This means that the file ‘14833’ has ‘palm-oil’ and ‘veg-oil’ as its categories, and ‘14839’ has ‘ship’ as its category.

  3. Run: python scluster/ cats.txt reusters/ -m kmeans,


  • When you use the Reuters set, notice No 17980 might contain non-Unicode character at Line 10. It should probably read: “world economic growth-side measures …”

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for scluster, version 0.0.2
Filename, size File type Python version Upload date Hashes
Filename, size scluster-0.0.2.tar.gz (6.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page