Skip to main content

an implementation of spectral clustering for text document collections

Project description

Homepage:

http://github.com/whym/scluster

Contact:

http://whym.org

Overview

Spectral clustering a modern clustering technique considered to be effective for image clustering among others. [1] [2]

This software find clusters among documents based on the bag-of-words representation [3] and TF-IDF weighting [4].

Requirements

Following softwares are required.

  • Python 2 or 3

  • Numpy

  • Scipy

How to use

  1. Prepare documents as raw-text files, and put them in a directory, for example, ‘reuters’.

  2. Prepare a category file. For example, ‘cats.txt’ may contain:

    14833 palm-oil veg-oil
    14839 ship

    This means that the file ‘14833’ has ‘palm-oil’ and ‘veg-oil’ as its categories, and ‘14839’ has ‘ship’ as its category.

  3. Run: python scluster/clusterer.py cats.txt reusters/ -m kmeans,

Notes

  • When you use the Reuters set, notice No 17980 might contain non-Unicode character at Line 10. It should probably read: “world economic growth-side measures …”

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scluster-0.0.2.tar.gz (6.8 kB view details)

Uploaded Source

File details

Details for the file scluster-0.0.2.tar.gz.

File metadata

  • Download URL: scluster-0.0.2.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for scluster-0.0.2.tar.gz
Algorithm Hash digest
SHA256 18cdb698ccca8c2355b1ef9dbef1340f8ea6003b0cdec845d8f0507cb97b83ad
MD5 bddeab556f84f542bc6376110a8679b3
BLAKE2b-256 41868cd37687f4f6580707e40ebc5f8722ba517cff4ec1c47f271b03eb047829

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page