A package for generating synthetic clusters, with parameters to customize different aspects of the complexity of the cluster structure
Project description
HAWKS Data Generator
====================
.. image:: docs/source/images/hawks_animation.gif
:alt: Example gif of HAWKS
:scale: 65 %
:align: center
.. summary-marker-1-start
HAWKS is a tool for generating controllably difficult synthetic data,
used primarily for clustering.
.. summary-marker-1-end
This `repo <https://github.com/sea-shunned/hawks>`_ is associated with the following paper:
.. paper-marker-1-start
1. `Shand, C. <http://sea-shunned.github.io/>`_, `Allmendinger, R. <https://personalpages.manchester.ac.uk/staff/Richard.Allmendinger/>`_, `Handl, J. <https://personalpages.manchester.ac.uk/staff/Julia.Handl/>`_, `Webb, A. <http://www.awebb.info/>`_, & Keane, J. (2019, July). Evolving controllably difficult datasets for clustering. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 463-471). https://doi.org/10.1145/3321707.3321761 **(Nominated for best paper on the evolutionary machine learning track at GECCO'19)**
The academic/technical details can be found there. What follows here is
a practical guide to using HAWKS to generate synthetic data.
.. paper-marker-1-end
If you use HAWKS to generate data that forms part of a paper, please
cite the paper above and link to this repo.
.. installation-marker-start
Installation
------------
Installation is available through pip by:
``pip install hawks``
.. installation-marker-end
or by cloning this repo (and installing locally using
``pip install .``). HAWKS was written for Python 3.6+. Other dependencies are specified in the `setup.py <https://github.com/sea-shunned/hawks/blob/master/setup.py>`_ file.
Running HAWKS
-------------
The parameters of hawks are configured via a config file system. Details
of the parameters are found in the `documentation <https://hawks.readthedocs.io/parameters>`_. For any parameters
that are not specified, default values will be used (as defined in
``hawks/defaults.json``).
.. example-marker-start
The example below illustrates how to run ``hawks``. Either a dictionary
or a path to a JSON config can be provided to override any of the
default values. Further examples can be found `here <https://hawks.readthedocs.io/examples>`_.
.. Need to turn the bit below into an example file and then just include that
.. code-block:: python
"""Single, simple HAWKS run, with KMeans applied to the best dataset
"""
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import hawks
# Set the magic seed number
SEED_NUM = 42
# Set the seed number in the config
config = {
"hawks": {
"folder_name": "simple_example",
"seed_num": SEED_NUM
},
"dataset": {
"num_clusters": 5
},
"objectives": {
"silhouette": {
"target": 0.9
}
}
}
# Any missing parameters will take from hawks/defaults.json
generator = hawks.create_generator(config)
# Run the generator
generator.run()
# Let's plot the best individual found
generator.plot_best_indivs(show=True)
# Get the best dataset found and it's labels
datasets, label_sets = generator.get_best_dataset()
# Stored as a list for multiple runs
data, labels = datasets[0], label_sets[0]
# Run KMeans on the data
km = KMeans(
n_clusters=len(np.unique(labels)), random_state=SEED_NUM
).fit(data)
# Plot the output of KMeans
hawks.plotting.scatter_prediction(data, km.labels_)
# Get the Adjusted Rand Index for KMeans on the data
ari = adjusted_rand_score(labels, km.labels_)
print(f"ARI: {ari}")
.. example-marker-end
Documentation
-------------
For further information about how to use HAWKS, including examples, please see the `documentation <https://hawks.readthedocs.io/>`__.
Issues
------
As this work is still in development, plain sailing is not guaranteed.
If you encounter an issue, first ensure that ``hawks`` is running as
intended by navigating to the tests directory, and running
``python tests.py``. If any test fails, please add details of this
alongside your original problem to an issue on the `GitHub repo <https://github.com/sea-shunned/hawks>`__.
Contributing
------------
.. contributing-marker-start
At present, this is primarily academic work, so future developments will be released here after they have been published. If you have any suggestions or simple feature requests for HAWKS as a tool to use, please raise that on the `GitHub repo <https://github.com/sea-shunned/hawks/issues>`__.
I have various directions for HAWKS at present, and can only work on a subset of them, and so involvement with more people would be great. If you would like to extend this work or collaborate, please `contact me <https://sea-shunned.github.io/>`__.
.. contributing-marker-end
====================
.. image:: docs/source/images/hawks_animation.gif
:alt: Example gif of HAWKS
:scale: 65 %
:align: center
.. summary-marker-1-start
HAWKS is a tool for generating controllably difficult synthetic data,
used primarily for clustering.
.. summary-marker-1-end
This `repo <https://github.com/sea-shunned/hawks>`_ is associated with the following paper:
.. paper-marker-1-start
1. `Shand, C. <http://sea-shunned.github.io/>`_, `Allmendinger, R. <https://personalpages.manchester.ac.uk/staff/Richard.Allmendinger/>`_, `Handl, J. <https://personalpages.manchester.ac.uk/staff/Julia.Handl/>`_, `Webb, A. <http://www.awebb.info/>`_, & Keane, J. (2019, July). Evolving controllably difficult datasets for clustering. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 463-471). https://doi.org/10.1145/3321707.3321761 **(Nominated for best paper on the evolutionary machine learning track at GECCO'19)**
The academic/technical details can be found there. What follows here is
a practical guide to using HAWKS to generate synthetic data.
.. paper-marker-1-end
If you use HAWKS to generate data that forms part of a paper, please
cite the paper above and link to this repo.
.. installation-marker-start
Installation
------------
Installation is available through pip by:
``pip install hawks``
.. installation-marker-end
or by cloning this repo (and installing locally using
``pip install .``). HAWKS was written for Python 3.6+. Other dependencies are specified in the `setup.py <https://github.com/sea-shunned/hawks/blob/master/setup.py>`_ file.
Running HAWKS
-------------
The parameters of hawks are configured via a config file system. Details
of the parameters are found in the `documentation <https://hawks.readthedocs.io/parameters>`_. For any parameters
that are not specified, default values will be used (as defined in
``hawks/defaults.json``).
.. example-marker-start
The example below illustrates how to run ``hawks``. Either a dictionary
or a path to a JSON config can be provided to override any of the
default values. Further examples can be found `here <https://hawks.readthedocs.io/examples>`_.
.. Need to turn the bit below into an example file and then just include that
.. code-block:: python
"""Single, simple HAWKS run, with KMeans applied to the best dataset
"""
import numpy as np
from sklearn.cluster import KMeans
from sklearn.metrics import adjusted_rand_score
import hawks
# Set the magic seed number
SEED_NUM = 42
# Set the seed number in the config
config = {
"hawks": {
"folder_name": "simple_example",
"seed_num": SEED_NUM
},
"dataset": {
"num_clusters": 5
},
"objectives": {
"silhouette": {
"target": 0.9
}
}
}
# Any missing parameters will take from hawks/defaults.json
generator = hawks.create_generator(config)
# Run the generator
generator.run()
# Let's plot the best individual found
generator.plot_best_indivs(show=True)
# Get the best dataset found and it's labels
datasets, label_sets = generator.get_best_dataset()
# Stored as a list for multiple runs
data, labels = datasets[0], label_sets[0]
# Run KMeans on the data
km = KMeans(
n_clusters=len(np.unique(labels)), random_state=SEED_NUM
).fit(data)
# Plot the output of KMeans
hawks.plotting.scatter_prediction(data, km.labels_)
# Get the Adjusted Rand Index for KMeans on the data
ari = adjusted_rand_score(labels, km.labels_)
print(f"ARI: {ari}")
.. example-marker-end
Documentation
-------------
For further information about how to use HAWKS, including examples, please see the `documentation <https://hawks.readthedocs.io/>`__.
Issues
------
As this work is still in development, plain sailing is not guaranteed.
If you encounter an issue, first ensure that ``hawks`` is running as
intended by navigating to the tests directory, and running
``python tests.py``. If any test fails, please add details of this
alongside your original problem to an issue on the `GitHub repo <https://github.com/sea-shunned/hawks>`__.
Contributing
------------
.. contributing-marker-start
At present, this is primarily academic work, so future developments will be released here after they have been published. If you have any suggestions or simple feature requests for HAWKS as a tool to use, please raise that on the `GitHub repo <https://github.com/sea-shunned/hawks/issues>`__.
I have various directions for HAWKS at present, and can only work on a subset of them, and so involvement with more people would be great. If you would like to extend this work or collaborate, please `contact me <https://sea-shunned.github.io/>`__.
.. contributing-marker-end
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hawks-0.2.0.tar.gz
(42.0 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
hawks-0.2.0-py3-none-any.whl
(60.5 kB
view details)
File details
Details for the file hawks-0.2.0.tar.gz.
File metadata
- Download URL: hawks-0.2.0.tar.gz
- Upload date:
- Size: 42.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1958a0a900a3fc48be047ed5e07e6635440778c4c1d0dce887c7ed493d5fb618
|
|
| MD5 |
db238f49c43b962faf7c0849f872214b
|
|
| BLAKE2b-256 |
2216358383134c34674c59f4e42d1717efe104e7ca8f0d5a927e0cb9191d0803
|
File details
Details for the file hawks-0.2.0-py3-none-any.whl.
File metadata
- Download URL: hawks-0.2.0-py3-none-any.whl
- Upload date:
- Size: 60.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.4.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68980c8669c72f82042bb655273813ecbcd93a80f7e784b0225773dacafb57b7
|
|
| MD5 |
ae96471c63f8122ab0de535dfecc2074
|
|
| BLAKE2b-256 |
c1232aa0d521822c949d8037034f3564978a86d3c17ea3300533855948b52e50
|