Skip to main content

Order Preserving Hierarchical Agglomerative Clustering

Project description

Copyright 2020 Daniel Bakkelund

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Background

The code in this project realises the theory described in https://arxiv.org/abs/2004.12488. The functionality provided is that of order preserving hierarchical agglomerative clustering of partially ordered sets.

Dependencies:

  • numpy
  • matplotlib

The library is made to run on python 3.x

The code can be used as-is, just place the src directory in your PYTHONPATH.

However, if you want to have a look at the examples, you should follow the below recipe.

Installing (for developers or if you want to view the examples)

1) Run

init.sh

This script downloads the repository https://bitbucket.org/Bakkelund/upyt containing the unit test library that has been used for the development of ophac.

2) Source the script setPyPath.sh:

>source setPyPath.sh

The script sets the PYTHONPATH environment variable. The script is written for UX like platforms, and may work for older versions of Cygwin as well. The directories to add to PYTHONPATH are as follows (in case you have to do it manually):

./src
./test
./xlibs/upyt/src

Remember that in PYTHONPATH you must specify these as absolute paths.

3) Now, try running

>python -um upyt.discover

This should make your prompt look something along the lines

>python -um upyt.discover
------------------------------------------------------------------------
Running 21 tests.
------------------------------------------------------------------------
.....................
------------------------------------------------------------------------
Ran 21 tests in 0.017 s.
------------------------------------------------------------------------
SUCCEEDED!!!
------------------------------------------------------------------------

4) Now, try running

>python -u examples/demo/json_demo.py

This should present a window containing three partial dendrograms. It is the clusterings of the data in Section 6 of the article. The example also shows how to load data from a file (json).

5) Now, try running

>python -u examples/random/random_demo.py

This may take a while. The program generates random data models and runs order preserving clustering using complete linkage. At the end of the run, a 3d-plot shows the correlation between set-sizes, number of ties and running times.

The above command runs one sample for each configuration. By running

>python -u examples/random/random_demo.py 5

you can have 5 samples generated for each configuration, but the running time will be five times longer, on average.

Data model

For documentation about the data model on a high level, take a look in the file datamodel.md, found in the same directory as this README file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ophac-0.2.0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

ophac-0.2.0-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file ophac-0.2.0.tar.gz.

File metadata

  • Download URL: ophac-0.2.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for ophac-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e9451e27553695fb6f35f55e9897d307f9f9ba10436a1b98688328522cd064ec
MD5 ce030285b75fe796820a76acf17fabb5
BLAKE2b-256 934857de9d569cd347adff4d38ff7c2d4766fa2ab188ebbc9ac88277d7aa95dd

See more details on using hashes here.

File details

Details for the file ophac-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ophac-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.7

File hashes

Hashes for ophac-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4ebcb3ad0f88547405c5bad0f9c7a311f2b40bb81ee8aad9c79deeaf656a7983
MD5 e83b64cd71276a789adecad9c2c063f4
BLAKE2b-256 1435e29bab57dba3956976397cfb3ecad9941d7c2258a2af994a61105e74ecc7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page