Skip to main content

Interactive Corpus Analysis Tool

Project description

ICAT logo

Interactive Corpus Analysis Tool

Code style: black PyPI version tests License status

The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning (IML) dashboard for unlabeled text datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and train a logistic regression model on the fly as they do so to assist in filtering, searching, and labeling tasks.

ICAT Screenshot

ICAT is implemented using holoviz's panel library, so it can either directly be rendered like a widget in a jupyter lab/notebook instance, or incorporated as part of a standalone panel website.

Installation

ICAT can be installed via pip with:

pip install icat-iml

Documentation

The user guide and API documentation can be found at https://ornl.github.io/icat.

Visualization

The primary ring visualization is called AnchorViz, a technique from IML literature. (See Chen, Nan-Chen, et al. "AnchorViz: Facilitating classifier error discovery through interactive semantic data exploration")

We implemented an ipywidget version of AnchorViz and use it in this project, it can be found separately at https://github.com/ORNL/ipyanchorviz

Citation

To cite usage of ICAT, please use the following bibtex:

@misc{doecode_105653,
    title = {Interactive Corpus Analysis Tool},
    author = {Martindale, Nathan and Stewart, Scott},
    abstractNote = {The Interactive Corpus Analysis Tool (ICAT) is an interactive machine learning dashboard for unlabeled text/natural language processing datasets that allows a user to iteratively and visually define features, explore and label instances of their dataset, and simultaneously train a logistic regression model. ICAT was created to allow subject matter experts in a specific domain to directly train their own models for unlabeled datasets visually, without needing to be a machine learning expert or needing to know how to code the models themselves. This approach allows users to directly leverage the power of machine learning, but critically, also involves the user in the development of the machine learning model.},
    year = {2023},
    month = {apr}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

icat_iml-0.7.4.tar.gz (59.9 kB view details)

Uploaded Source

Built Distribution

icat_iml-0.7.4-py3-none-any.whl (51.7 kB view details)

Uploaded Python 3

File details

Details for the file icat_iml-0.7.4.tar.gz.

File metadata

  • Download URL: icat_iml-0.7.4.tar.gz
  • Upload date:
  • Size: 59.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for icat_iml-0.7.4.tar.gz
Algorithm Hash digest
SHA256 c81d8782c9ba68a5bd8683169a0e2b4b52adb3082dca2feffa8d7e5411c55830
MD5 263c842be043c2e73f4651789a36c90e
BLAKE2b-256 5f7e357238a1702f2a7045da3858ed578e63f4ae64300e0ebc309ce4ad2eb9fa

See more details on using hashes here.

File details

Details for the file icat_iml-0.7.4-py3-none-any.whl.

File metadata

  • Download URL: icat_iml-0.7.4-py3-none-any.whl
  • Upload date:
  • Size: 51.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for icat_iml-0.7.4-py3-none-any.whl
Algorithm Hash digest
SHA256 f878f9ae675b9eeefee819a4678f857ce0679b99532ebd8c2e6c55cd3d075920
MD5 10943b6497f045618822e0920aa4b3f4
BLAKE2b-256 6b17de30b607483ce4390250795d1688986ba0e6ba3e2a2a1d170f84a0d31312

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page