scikit-query is a Python library for active query strategies in constrained clustering on top of SciPy and scikit-learn.
Project description
scikit-query
Clustering aims to group data into clusters without the help of labels, unlike classification algorithms. A well-known shortcoming of clustering algorithms is that they rely on an objective function geared toward specific types of clusters (convex, dense, well-separated), and hyperparameters that are hard to tune. Semi-supervised clustering mitigates these problems by injecting background knowledge in order to guide the clustering. Active clustering algorithms analyze the data to select interesting points to ask the user about, generating constraints that allow fast convergence towards a user-specified partition.
scikit-query is a library of active query strategies for constrained clustering inspired by scikit-learn and the now inactive active-semi-supervised-clustering library by Jakub Švehla.
It is focused on algorithm-agnostic query strategies, i.e. methods that do not rely on a particular clustering algorithm. From an input dataset, they produce a set of constraints by making insightful queries to an oracle.
In typical scikit way, the library is used by instanciating a class and using its fit method.
qs = QueryStrategy()
oracle = MLCLOracle(truth=labels, budget=10)
constraints = qs.fit(dataset.data, oracle)
Algorithms
- random sampling
- FFQS from Basu et al. 2004
- MinMax from Mallapragada et al. 2008
- NPU from Xiong et al. 2013. This is an incremental variant that doesn't rely on a constrained clustering algorithm but rather takes a partition as input and outputs a constraint set.
- AIPC from Zhang et al. 2019
- SASC from Abin & Beigy 2014
Dependencies
scikit-query is developed on Python >= 3.10, and requires the following libraries :
- numpy~=1.24.3
- scipy~=1.10.1
- pandas~=2.0.1
- scikit-learn~=1.2.2
- scikit-fuzzy~=0.4.2
- cvxopt~=1.3.1
- matplotlib~=3.7.1
- plotly~=5.14.1
Contributors
FFQS, MinMax and NPU are based off the original implementation of Jakub Švehla and changed for library consistency. Other algorithms have been implemented by Aymeric Beauchamp or his students from the University of Orléans :
- Salma Badri, Elis Ishimwe, Brice Jacquesson, Matthéo Pailler (2023)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scikit_query-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e601de475b03e28a031b0bcb8df4765206f62c96a1319a3c8b2e02ab86acd33f |
|
MD5 | 656b3aa54647d6b0756eff0f8c3613f2 |
|
BLAKE2b-256 | afa812245a49edaeb23edf1d5e9c60170c604882f439ad4bd637f2a47545203e |