Skip to main content

A web application designed for NLP data annotation using Interactive Clustering methodology.

Project description

Interactive Clustering GUI

ci documentation pypi version DOI

A web application designed for NLP data annotation using Interactive Clustering methodology.

Quick description

Interactive clustering is a method intended to assist in the design of a training data set.

This iterative process begins with an unlabeled dataset, and it uses a sequence of two substeps :

  1. the user defines constraints on data sampled by the computer ;
  2. the computer performs data partitioning using a constrained clustering algorithm.

Thus, at each step of the process :

  • the user corrects the clustering of the previous steps using constraints, and
  • the computer offers a corrected and more relevant data partitioning for the next step.

This web application implements this annotation methodology with several features:

  • data preprocessing and vectorization in order to reduce noise in data;
  • constrainted clustering in order to automatically partition the data;
  • constraints sampling in order to select the most relevant data to annotate;
  • binary constraints annotation in order to correct clustering relevance;
  • annotation review and conflicts analysis in order to improve constraints consistency.

For more details, read the Documentation and the articles in the References section.

Documentation

Requirements

Interactive Clustering GUI requires Python 3.7 or above.

To install Python 3.7, I recommend using pyenv.
# install pyenv
git clone https://github.com/pyenv/pyenv ~/.pyenv

# setup pyenv (you should also put these three lines in .bashrc or similar)
export PATH="${HOME}/.pyenv/bin:${PATH}"
export PYENV_ROOT="${HOME}/.pyenv"
eval "$(pyenv init -)"

# install Python 3.7
pyenv install 3.7

# make it available globally
pyenv global system 3.7

Installation

With pip:

# install package
python3 -m pip install cognitivefactory-interactive-clustering-gui

# install spacy language model dependencies (the one you want, with version "3.1.x")
python3 -m spacy download fr_core_news_md-3.1.0 --direct

With pipx:

# install pipx
python3 -m pip install --user pipx

# install package
pipx install --python python3 cognitivefactory-interactive-clustering-gui

# install spacy language model dependencies (the one you want, with version "3.1.x")
python3 -m spacy download fr_core_news_md-3.1.0 --direct

Run

To display the help message:

cognitivefactory-interactive-clustering-gui --help

To launch the web application:

cognitivefactory-interactive-clustering-gui  # launch on 127.0.0.1:8080

Then, go to one of the following pages in your browser:

Development

To work on this project or contribute to it, please read the Copier PDM documentation.

Quick setup and help

Get the code and prepare the environment:

git clone https://github.com/cognitivefactory/interactive-clustering-gui/
cd interactive-clustering-gui
make setup

Show the help:

make help  # or just make

Launch the web application in debug mode:

make run  # launch on 127.0.0.1:8080

Then, go to one of the following pages in your browser:

For more details, read the Contributing documentation.

References

  • Interactive Clustering:

    • First presentation: Schild, E., Durantin, G., Lamirel, J.C., & Miconi, F. (2021). Conception itérative et semi-supervisée d'assistants conversationnels par regroupement interactif des questions. In EGC 2021 - 21èmes Journées Francophones Extraction et Gestion des Connaissances. Edition RNTI. ⟨hal-03133007⟩.
    • Theoretical study: Schild, E., Durantin, G., Lamirel, J., & Miconi, F. (2022). Iterative and Semi-Supervised Design of Chatbots Using Interactive Clustering. International Journal of Data Warehousing and Mining (IJDWM), 18(2), 1-19. http://doi.org/10.4018/IJDWM.298007. ⟨hal-03648041⟩.
    • Methodological discussion: Schild, E., Durantin, G., & Lamirel, J.C. (2021). Concevoir un assistant conversationnel de manière itérative et semi-supervisée avec le clustering interactif. In Atelier - Fouille de Textes - Text Mine 2021 - En conjonction avec EGC 2021. ⟨hal-03133060⟩.
    • Implementation: Schild, E. (2021). cognitivefactory/interactive-clustering. Zenodo. https://doi.org/10.5281/zenodo.4775251.
  • Web application:

    • FastAPI: https://fastapi.tiangolo.com/

How to cite

Schild, E. (2021). cognitivefactory/interactive-clustering-gui. Zenodo. https://doi.org/10.5281/zenodo.4775270.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

File details

Details for the file cognitivefactory-interactive-clustering-gui-0.4.0.tar.gz.

File metadata

  • Download URL: cognitivefactory-interactive-clustering-gui-0.4.0.tar.gz
  • Upload date:
  • Size: 830.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.12.0 pkginfo/1.7.1 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.64.0 CPython/3.8.10

File hashes

Hashes for cognitivefactory-interactive-clustering-gui-0.4.0.tar.gz
Algorithm Hash digest
SHA256 de060fa458b37e14d42c4f6d43ab4df24e4fcd9f273ccbde7bfd2087256b62a8
MD5 c220eb9106caf93e306a49d3830a495b
BLAKE2b-256 a234c0f354329b87af3ec7ceff6f51cc347222822051aad55801d0c575e494d5

See more details on using hashes here.

File details

Details for the file cognitivefactory_interactive_clustering_gui-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cognitivefactory_interactive_clustering_gui-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 afe3ae1923d6ac73c5c23395d92eb777691e2f0b4079b0adc4d5e6cdd1a9cdae
MD5 ef7a5492ed4b534009882b14cd940221
BLAKE2b-256 8f902de2160a4d3d8f10bf0bc3ceb14ebda8ad5fbf3f48218127cf5ff70eab1b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page