Skip to main content

Fast Word Classifier

Project description

FAWOC the FAst WOrd Classifier

FAWOC is a TUI program for manually labelling a list of words. It has been developed to support the efficient clustering of documents based on topic modeling algorithms such as Dirichlet Latent Allocation.

The programs reads a CSV file containing the terms and allows the fast association of labels to the terms.

Each term is presented to the user, who can associate to the term one of the labels with the press of a key.

Some statistics are provided in the user interface to have a clue about the number of classified terms and the remaining ones.

The terms are sorted according to their frequency in the set of documents, which is a value that must be made available to FAWOC.

Example of usage

fawoc terms.csv

The input file terms.csv needs to have at least one column with the header (first column) called term.

Available commands and keybindings

The following labels are currently supported:

  • k keyword
  • n noise
  • r relevant
  • x not-relevant
  • s stopword
  • p postponed
  • a autonoise

Other keys allow to save and quit:

  • w save immediately
  • q quit

FAWOC automatically saves the changes on closing. Moreover, it autosaves the changes every 10 classified words.

Logging

FAWOC writes profiling information into the file profiler.log with the relevant operations that are carried out.

Files

FAWOC reads the terms from a tsv file with the following structure:

  • id: identification number of the term. Must be unique. For backward compatibility this column may be missing. In this case, FAWOC assigns an id to each term that will be saved in a newly created column id on the first save;
  • term: the term itself. For backward compatibility with old files, this column can be called keyword. This name is deprecated, and FAWOC will change it to term on the first save;
  • label: a string describing the label assigned to the term.

FAWOC will load other information from two service files. These files are named after the input file, by removing its extension, adding the suffix fawoc_data and then adding the proper extension. The service files are:

  • *_fawoc_data.tsv: it contains static information about each term. It is saved only on FAWOC closing. Currently, it is used to load the number of occurrences of each term;
  • *_fawoc_data.json: it contains information used by FAWOC to correctly handle the undo command.

The --no-info-file command line option can be used to tell FAWOC to not load (and save) the *_fawoc_data.tsv. With this option, FAWOC will not display the count value.

*_fawoc_data.tsv

The format of this tsv file is:

  • id: identification number of the term;
  • term: the term itself. This field is not directly ised by FAWOC, and it is here only the make this file more readable;
  • count: the number of occurrences of the term.

For backward compatibility with old files, if the *_fawoc_data.tsv file is missing, FAWOC searches for the count column in the input file. If this column is found, then FAWOC will use this value, otherwise the value -1 will be used. A new file *_fawoc_data.tsv is created on the first save with the loaded values of count.

*_fawoc_data.json

The format of this JSON file is a dictionary. The keys of this dictionary are the id of the terms. The values are dictionaries with the following format:

  • order: number indicating the order in which each term is classified;
  • related: related term selected at the moment the term is classified.

For backward compatibility with old files, if the *_fawoc_data.json is missing, FAWOC searches the order and the related fields in the input file. If they are not found, then FAWOC will not be able to handle the undo of the classifications made before. Each new classification will have its own entry in a newly created *_fawoc_data.json.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fawoc-0.12.3.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

fawoc-0.12.3-py3-none-any.whl (32.0 kB view details)

Uploaded Python 3

File details

Details for the file fawoc-0.12.3.tar.gz.

File metadata

  • Download URL: fawoc-0.12.3.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.57.0 importlib-metadata/4.0.1 keyring/18.0.1 rfc3986/1.4.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for fawoc-0.12.3.tar.gz
Algorithm Hash digest
SHA256 1cfa7f1e1fb94c3a1c110edf6497430b4a5cd9e416fffc709e301eddd24917e1
MD5 172cfdd0d878be8d32befe73387335ea
BLAKE2b-256 a305cc156371eea64572082abe77f2bd6dead91a4455559089231874a435e7e8

See more details on using hashes here.

File details

Details for the file fawoc-0.12.3-py3-none-any.whl.

File metadata

  • Download URL: fawoc-0.12.3-py3-none-any.whl
  • Upload date:
  • Size: 32.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.57.0 importlib-metadata/4.0.1 keyring/18.0.1 rfc3986/1.4.0 colorama/0.4.3 CPython/3.8.10

File hashes

Hashes for fawoc-0.12.3-py3-none-any.whl
Algorithm Hash digest
SHA256 dbf17a1db5aac14bf62a2cf859a3eab325258436bfceb7353b811b171c8f1a21
MD5 541f1317e70fb9dc1ee2e2e72adfa6d4
BLAKE2b-256 db76d5d98b3b35567c223f163a58e4b2363c8f43fbb26e0d8aaf12b935b20ef8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page