scluster

an implementation of spectral clustering for text document collections

Project description

Homepage:

http://github.com/whym/scluster

Contact:

http://whym.org

Overview

Spectral clustering a modern clustering technique considered to be effective for image clustering among others. [1] [2]

This software find clusters among documents based on the bag-of-words representation [3] and TF-IDF weighting [4].

Requirements

Following softwares are required.

Python 2 or 3
Numpy
Scipy

How to use

Prepare documents as raw-text files, and put them in a directory, for example, ‘reuters’.
Prepare a category file. For example, ‘cats.txt’ may contain:
```
14833 palm-oil veg-oil
14839 ship
```
This means that the file ‘14833’ has ‘palm-oil’ and ‘veg-oil’ as its categories, and ‘14839’ has ‘ship’ as its category.
Run: python scluster/clusterer.py cats.txt reusters/ -m kmeans,

Notes

When you use the Reuters set, notice No 17980 might contain non-Unicode character at Line 10. It should probably read: “world economic growth-side measures …”

Project details

Release history Release notifications | RSS feed

This version

0.0.2

Dec 30, 2015

0.0.1

Dec 30, 2015

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scluster-0.0.2.tar.gz (6.8 kB view details)

Uploaded Dec 30, 2015 Source

File details

Details for the file scluster-0.0.2.tar.gz.

File metadata

Download URL: scluster-0.0.2.tar.gz
Upload date: Dec 30, 2015
Size: 6.8 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for scluster-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`18cdb698ccca8c2355b1ef9dbef1340f8ea6003b0cdec845d8f0507cb97b83ad`
MD5	`bddeab556f84f542bc6376110a8679b3`
BLAKE2b-256	`41868cd37687f4f6580707e40ebc5f8722ba517cff4ec1c47f271b03eb047829`

See more details on using hashes here.

scluster 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta