Skip to main content

Normalized distance from unimodality

Project description

The Normalized Distance from Unimodality (nDFU)

nDFU detects structured annotator disagreement in ordinal ratings. It is designed for cases where annotators do not merely vary around one shared judgment, but instead form separated poles in the rating distribution.

Distance from unimodality (DFU) measures how far a distribution is from unimodality, that is, from having a single peak. This repository provides a normalized version of that measure (nDFU), along with helper functions for turning ordinal annotations into relative-frequency vectors.

A low nDFU score means the annotations are compatible with one peak: consensus, near-consensus, or ordinary spread around a central tendency. A high nDFU score means the distribution departs from that single-peak shape, for example when many annotators choose low ratings and many others choose high ratings, with few annotations in between. In that sense, nDFU identifies disagreement that is structured as poles rather than unstructured noise.

Two annotation histograms: one unimodal distribution with nDFU 0.0 and one pole-like distribution with nDFU 1.0.

nDFU is useful when annotation distributions should not be collapsed immediately into a single majority label. In the accompanying paper, nDFU is used as a signal for identifying pole-like, non-unimodal annotation patterns and for training a K+1-class classifier, where selected instances are assigned to an additional class instead of being forced into one of the original K classes.

An empirical analysis on three toxic-language datasets shows how this signal can be used to model polarized annotations and study conditions that may explain them, such as annotator gender or race.

nDFU does not by itself explain why the poles exist. It detects the shape of structured disagreement; subgroup analysis, qualitative inspection, or downstream modeling can then be used to study what the poles correspond to.

A workflow showing ordinal annotations converted to a histogram, scored with nDFU, and then used for inspection, subgroup analysis, or K+1 modeling.

You may find the article in the ACL proceedings. Please note that datasets must be uploaded externally in the original application notebooks for licensing issues.

Notebooks

Installation

pip install ndfu

To install the latest version directly from GitHub:

pip install git+https://github.com/ipavlopoulos/ndfu.git

For local development:

git clone https://github.com/ipavlopoulos/ndfu.git
cd ndfu
pip install -e .

Usage

Import the library and use the relative frequencies of ordinal ratings as input. Here, annotators are split between low and high ratings, so nDFU reports structured disagreement:

>>> from ndfu import dfu, pdf
>>> rating = (1, 1, 2, 5, 5, 5)
>>> x = pdf(rating, range(1, 6))
>>> dfu(x)
0.3333333333333333

You can also pass a histogram directly:

>>> dfu([0.2, 0.6, 0.2])
0.0
>>> dfu([0.5, 0.0, 0.5])
1.0

The first histogram has one central peak, so it is unimodal. The second has two separated peaks with a valley between them, so it represents pole-like disagreement.

For modeling, import UnimodalLearner to build binary, unimodal-only, and K+1 classifiers from annotation distributions:

>>> from ndfu import UnimodalLearner
>>> learner = UnimodalLearner(
...     train,
...     dev,
...     test,
...     feature_cols=["x1", "x2"],
...     scores_col="scores",
...     scale=range(1, 6),
...     threshold=0.0,
... )
>>> learner.fit_binary_baseline()
>>> learner.fit_unimodal_only_baseline()
>>> learner.fit_kplus_model()

Older examples that import from ndfu.src import * or from src import * are still supported for compatibility.

Development

Run the test suite with:

pytest

Contributing

Please cite this work as:

@inproceedings{pavlopoulos-likas-ndfu,
  title={Polarized Opinion Detection Improves the Detection of Toxic Language},
  author={Pavlopoulos, John and Likas, Aristidis},
  booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
  year={2024}
}

Consider citing also the original article as:

@article{pavlopoulos-likas-2022,
    title = "Distance from Unimodality for the Assessment of Opinion Polarization",
    author = "Pavlopoulos, John  and Likas, Aristidis",
    journal = "Cognitive Computation",
    doi = "10.1007/s12559-022-10088-2",
    year = "2022",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndfu-0.9.3.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndfu-0.9.3-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file ndfu-0.9.3.tar.gz.

File metadata

  • Download URL: ndfu-0.9.3.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.3.tar.gz
Algorithm Hash digest
SHA256 02fa7f33a5f30c7b77aa94efba21669110ea7ef7eefd89e2dfd77ee94fa7f358
MD5 845845a1522a272b3cba8911f4f4a5df
BLAKE2b-256 15a3dcfaeec30ab35749c98cc06b20b4c79c463f712cae4d19a42f510582149b

See more details on using hashes here.

File details

Details for the file ndfu-0.9.3-py3-none-any.whl.

File metadata

  • Download URL: ndfu-0.9.3-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 1877673dac057f8343c49bcbca8539b6e75f31f727a4c77f1f51b5804966f0d0
MD5 fe26aadd0b6aa1914d4bc3a37aff42a4
BLAKE2b-256 2a5677fcd4209c7102729836045b184fd9a8656f0b8d2f91d32fcadcf114a6a1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page