python-cluster is a "simple" package that allows to create several groups (clusters) of objects from a list.
Project description
DESCRIPTION
===========
python-cluster is a "simple" package that allows to create several groups
(clusters) of objects from a list. It's meant to be flexible and able to
cluster any object. To ensure this kind of flexibility, you need not only to
supply the list of objects, but also a function that calculates the similarity
between two of those objects. For simple datatypes, like integers, this can be
as simple as a subtraction, but more complex calculations are possible. Right
now, it is possible to generate the clusters using a hierarchical clustering
and the popular K-Means algorithm. For the hierarchical algorithm there are
different "linkage" (single, complete, average and uclus) methods available. I
plan to implement other algoithms as well on an
"as-needed" or "as-I-have-time" basis.
Algorithms are based on the document found at
http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/
USAGE
=====
A simple python program could look like this::
>>> from cluster import *
>>> data = [12,34,23,32,46,96,13]
>>> cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
>>> cl.getlevel(10) # get clusters of items closer than 10
[96, 46, [12, 13, 23, 34, 32]]
>>> cl.getlevel(5) # get clusters of items closer than 5
[96, 46, [12, 13], 23, [34, 32]]
Note, that when you retrieve a set of clusters, it immediately starts the
clustering process, which is quite complex. If you intend to create clusters
from a large dataset, consider doing that in a separate thread.
For K-Means clustering it would look like this:
>>> from cluster import KMeansClustering
>>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
>>> clusters = cl.getclusters(2)
The parameter passed to getclusters is the count of clusters generated.
===========
python-cluster is a "simple" package that allows to create several groups
(clusters) of objects from a list. It's meant to be flexible and able to
cluster any object. To ensure this kind of flexibility, you need not only to
supply the list of objects, but also a function that calculates the similarity
between two of those objects. For simple datatypes, like integers, this can be
as simple as a subtraction, but more complex calculations are possible. Right
now, it is possible to generate the clusters using a hierarchical clustering
and the popular K-Means algorithm. For the hierarchical algorithm there are
different "linkage" (single, complete, average and uclus) methods available. I
plan to implement other algoithms as well on an
"as-needed" or "as-I-have-time" basis.
Algorithms are based on the document found at
http://www.elet.polimi.it/upload/matteucc/Clustering/tutorial_html/
USAGE
=====
A simple python program could look like this::
>>> from cluster import *
>>> data = [12,34,23,32,46,96,13]
>>> cl = HierarchicalClustering(data, lambda x,y: abs(x-y))
>>> cl.getlevel(10) # get clusters of items closer than 10
[96, 46, [12, 13, 23, 34, 32]]
>>> cl.getlevel(5) # get clusters of items closer than 5
[96, 46, [12, 13], 23, [34, 32]]
Note, that when you retrieve a set of clusters, it immediately starts the
clustering process, which is quite complex. If you intend to create clusters
from a large dataset, consider doing that in a separate thread.
For K-Means clustering it would look like this:
>>> from cluster import KMeansClustering
>>> cl = KMeansClustering([(1,1), (2,1), (5,3), ...])
>>> clusters = cl.getclusters(2)
The parameter passed to getclusters is the count of clusters generated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cluster-1.1.1b3.tar.gz
(38.4 kB
view details)
File details
Details for the file cluster-1.1.1b3.tar.gz
.
File metadata
- Download URL: cluster-1.1.1b3.tar.gz
- Upload date:
- Size: 38.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f04d5ed367f9abcc169aac814480aca286548b694b9b5dd1414a055ba775ff22 |
|
MD5 | a549edf712328540b08d394c79af2506 |
|
BLAKE2b-256 | e685e0e398c797646f92bde1e7fd4d68d7e331f3c3dea54a01a6dda5e6ee3f74 |