Skip to main content

Normalized distance from unimodality

Project description

The Normalized Distance from Unimodality (nDFU)

nDFU detects structured annotator disagreement in ordinal ratings. It is designed for cases where annotators do not merely vary around one shared judgment, but instead form separated poles in the rating distribution.

Distance from unimodality (DFU) measures how far a distribution is from unimodality, that is, from having a single peak. This repository provides a normalized version of that measure (nDFU), along with helper functions for turning ordinal annotations into relative-frequency vectors.

A low nDFU score means the annotations are compatible with one peak: consensus, near-consensus, or ordinary spread around a central tendency. A high nDFU score means the distribution departs from that single-peak shape, for example when many annotators choose low ratings and many others choose high ratings, with few annotations in between. In that sense, nDFU identifies disagreement that is structured as poles rather than unstructured noise.

Two annotation histograms: one unimodal distribution with nDFU 0.0 and one pole-like distribution with nDFU 1.0.

nDFU is useful when annotation distributions should not be collapsed immediately into a single majority label. In the accompanying paper, nDFU is used as a signal for identifying pole-like, non-unimodal annotation patterns and for training a K+1-class classifier, where selected instances are assigned to an additional class instead of being forced into one of the original K classes.

An empirical analysis on three toxic-language datasets shows how this signal can be used to model polarized annotations and study conditions that may explain them, such as annotator gender or race.

nDFU does not by itself explain why the poles exist. It detects the shape of structured disagreement; subgroup analysis, qualitative inspection, or downstream modeling can then be used to study what the poles correspond to.

A workflow showing ordinal annotations converted to a histogram, scored with nDFU, and then used for inspection, subgroup analysis, or K+1 modeling.

You may find the article in the ACL proceedings. Please note that datasets must be uploaded externally in the original application notebooks for licensing issues.

Notebooks

Installation

pip install ndfu

To install the latest version directly from GitHub:

pip install git+https://github.com/ipavlopoulos/ndfu.git

For local development:

git clone https://github.com/ipavlopoulos/ndfu.git
cd ndfu
pip install -e .

Usage

Import the library and use the relative frequencies of ordinal ratings as input. Here, annotators are split between low and high ratings, so nDFU reports structured disagreement:

>>> from ndfu import dfu, pdf
>>> rating = (1, 1, 2, 5, 5, 5)
>>> x = pdf(rating, range(1, 6))
>>> dfu(x)
0.3333333333333333

You can also pass a histogram directly:

>>> dfu([0.2, 0.6, 0.2])
0.0
>>> dfu([0.5, 0.0, 0.5])
1.0

The first histogram has one central peak, so it is unimodal. The second has two separated peaks with a valley between them, so it represents pole-like disagreement.

For modeling, import UnimodalLearner to build binary, unimodal-only, and K+1 classifiers from annotation distributions:

>>> from ndfu import UnimodalLearner
>>> learner = UnimodalLearner(
...     train,
...     dev,
...     test,
...     feature_cols=["x1", "x2"],
...     scores_col="scores",
...     scale=range(1, 6),
...     threshold=0.0,
... )
>>> learner.fit_binary_baseline()
>>> learner.fit_unimodal_only_baseline()
>>> learner.fit_kplus_model()

Older examples that import from ndfu.src import * or from src import * are still supported for compatibility.

Development

Run the test suite with:

pytest

Contributing

Please cite this work as:

@inproceedings{pavlopoulos-likas-ndfu,
  title={Polarized Opinion Detection Improves the Detection of Toxic Language},
  author={Pavlopoulos, John and Likas, Aristidis},
  booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
  year={2024}
}

Consider citing also the original article as:

@article{pavlopoulos-likas-2022,
    title = "Distance from Unimodality for the Assessment of Opinion Polarization",
    author = "Pavlopoulos, John  and Likas, Aristidis",
    journal = "Cognitive Computation",
    doi = "10.1007/s12559-022-10088-2",
    year = "2022",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndfu-0.9.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ndfu-0.9.2-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file ndfu-0.9.2.tar.gz.

File metadata

  • Download URL: ndfu-0.9.2.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.2.tar.gz
Algorithm Hash digest
SHA256 2d1581ea09573e5e815df03d923fb841deeb80a3b677ca1e7dea3ad613a1087b
MD5 847a8f0500fd0385c97d7296c7c246b8
BLAKE2b-256 d39eec60e6884a103e7899c800c1f1de5a1b4849511358bf283f00959db18991

See more details on using hashes here.

File details

Details for the file ndfu-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: ndfu-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6e3be563b08ac2fe7d70eea9b0d69b6203db3bc0e1f2ce6ffaee83433b0ec6ad
MD5 008eb39da017b71adc54d718ddec3717
BLAKE2b-256 102e2a6728948f581cb96d5baa540ecb6be7e94ab64d76aab982b93510a84866

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page