Normalized distance from unimodality
Project description
The Normalized Distance from Unimodality (nDFU)
nDFU detects structured annotator disagreement in ordinal ratings. It is designed for cases where annotators do not merely vary around one shared judgment, but instead form separated poles in the rating distribution.
Distance from unimodality (DFU) measures how far a distribution is from unimodality, that is, from having a single peak. This repository provides a normalized version of that measure (nDFU), along with helper functions for turning ordinal annotations into relative-frequency vectors.
A low nDFU score means the annotations are compatible with one peak: consensus, near-consensus, or ordinary spread around a central tendency. A high nDFU score means the distribution departs from that single-peak shape, for example when many annotators choose low ratings and many others choose high ratings, with few annotations in between. In that sense, nDFU identifies disagreement that is structured as poles rather than unstructured noise.
nDFU is useful when annotation distributions should not be collapsed immediately into a single majority label. In the accompanying paper, nDFU is used as a signal for identifying pole-like, non-unimodal annotation patterns and for training a K+1-class classifier, where selected instances are assigned to an additional class instead of being forced into one of the original K classes.
An empirical analysis on three toxic-language datasets shows how this signal can be used to model polarized annotations and study conditions that may explain them, such as annotator gender or race.
nDFU does not by itself explain why the poles exist. It detects the shape of structured disagreement; subgroup analysis, qualitative inspection, or downstream modeling can then be used to study what the poles correspond to.
You may find the article in the ACL proceedings. Please note that datasets must be uploaded externally in the original application notebooks for licensing issues.
Notebooks
- nDFU example
- nDFU mechanism
- UnimodalLearner tutorial
- K+1 learning with synthetic data
- Original experiment reproduction
- POPQUORN Potato-Prolific application
Installation
pip install ndfu
To install the latest version directly from GitHub:
pip install git+https://github.com/ipavlopoulos/ndfu.git
For local development:
git clone https://github.com/ipavlopoulos/ndfu.git
cd ndfu
pip install -e .
Usage
Import the library and use the relative frequencies of ordinal ratings as input. Here, annotators are split between low and high ratings, so nDFU reports structured disagreement:
>>> from ndfu import dfu, pdf
>>> rating = (1, 1, 2, 5, 5, 5)
>>> x = pdf(rating, range(1, 6))
>>> dfu(x)
0.3333333333333333
You can also pass a histogram directly:
>>> dfu([0.2, 0.6, 0.2])
0.0
>>> dfu([0.5, 0.0, 0.5])
1.0
The first histogram has one central peak, so it is unimodal. The second has two separated peaks with a valley between them, so it represents pole-like disagreement.
For modeling, import UnimodalLearner to build binary, unimodal-only, and K+1 classifiers from annotation distributions:
>>> from ndfu import UnimodalLearner
>>> learner = UnimodalLearner(
... train,
... dev,
... test,
... feature_cols=["x1", "x2"],
... scores_col="scores",
... scale=range(1, 6),
... threshold=0.0,
... )
>>> learner.fit_binary_baseline()
>>> learner.fit_unimodal_only_baseline()
>>> learner.fit_kplus_model()
Older examples that import from ndfu.src import * or from src import * are still supported for compatibility.
Development
Run the test suite with:
pytest
Contributing
Please cite this work as:
@inproceedings{pavlopoulos-likas-ndfu,
title={Polarized Opinion Detection Improves the Detection of Toxic Language},
author={Pavlopoulos, John and Likas, Aristidis},
booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
year={2024}
}
Consider citing also the original article as:
@article{pavlopoulos-likas-2022,
title = "Distance from Unimodality for the Assessment of Opinion Polarization",
author = "Pavlopoulos, John and Likas, Aristidis",
journal = "Cognitive Computation",
doi = "10.1007/s12559-022-10088-2",
year = "2022",
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ndfu-0.9.2.tar.gz.
File metadata
- Download URL: ndfu-0.9.2.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d1581ea09573e5e815df03d923fb841deeb80a3b677ca1e7dea3ad613a1087b
|
|
| MD5 |
847a8f0500fd0385c97d7296c7c246b8
|
|
| BLAKE2b-256 |
d39eec60e6884a103e7899c800c1f1de5a1b4849511358bf283f00959db18991
|
File details
Details for the file ndfu-0.9.2-py3-none-any.whl.
File metadata
- Download URL: ndfu-0.9.2-py3-none-any.whl
- Upload date:
- Size: 8.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e3be563b08ac2fe7d70eea9b0d69b6203db3bc0e1f2ce6ffaee83433b0ec6ad
|
|
| MD5 |
008eb39da017b71adc54d718ddec3717
|
|
| BLAKE2b-256 |
102e2a6728948f581cb96d5baa540ecb6be7e94ab64d76aab982b93510a84866
|