Normalized distance from unimodality

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Project description

The Normalized Distance from Unimodality (nDFU)

nDFU detects structured annotator disagreement in ordinal ratings. It is designed for cases where annotators do not merely vary around one shared judgment, but instead form separated poles in the rating distribution.

Distance from unimodality (DFU) measures how far a distribution is from unimodality, that is, from having a single peak. This repository provides a normalized version of that measure (nDFU), along with helper functions for turning ordinal annotations into relative-frequency vectors.

A low nDFU score means the annotations are compatible with one peak: consensus, near-consensus, or ordinary spread around a central tendency. A high nDFU score means the distribution departs from that single-peak shape, for example when many annotators choose low ratings and many others choose high ratings, with few annotations in between. In that sense, nDFU identifies disagreement that is structured as poles rather than unstructured noise.

Two annotation histograms: one unimodal distribution with nDFU 0.0 and one pole-like distribution with nDFU 1.0.

nDFU is useful when annotation distributions should not be collapsed immediately into a single majority label. In the accompanying paper, nDFU is used as a signal for identifying pole-like, non-unimodal annotation patterns and for training a K+1-class classifier, where selected instances are assigned to an additional class instead of being forced into one of the original K classes.

An empirical analysis on three toxic-language datasets shows how this signal can be used to model polarized annotations and study conditions that may explain them, such as annotator gender or race.

nDFU does not by itself explain why the poles exist. It detects the shape of structured disagreement; subgroup analysis, qualitative inspection, or downstream modeling can then be used to study what the poles correspond to.

A workflow showing ordinal annotations converted to a histogram, scored with nDFU, and then used for inspection, subgroup analysis, or K+1 modeling.

You may find the article in the ACL proceedings. Please note that datasets must be uploaded externally in the original application notebooks for licensing issues.

Notebooks

Installation

pip install ndfu

To install the latest version directly from GitHub:

pip install git+https://github.com/ipavlopoulos/ndfu.git

For local development:

git clone https://github.com/ipavlopoulos/ndfu.git
cd ndfu
pip install -e .

Usage

Import the library and use the relative frequencies of ordinal ratings as input. Here, annotators are split between low and high ratings, so nDFU reports structured disagreement:

>>> from ndfu import dfu, pdf
>>> rating = (1, 1, 2, 5, 5, 5)
>>> x = pdf(rating, range(1, 6))
>>> dfu(x)
0.3333333333333333

You can also pass a histogram directly:

>>> dfu([0.2, 0.6, 0.2])
0.0
>>> dfu([0.5, 0.0, 0.5])
1.0

The first histogram has one central peak, so it is unimodal. The second has two separated peaks with a valley between them, so it represents pole-like disagreement.

For modeling, import UnimodalLearner to build binary, unimodal-only, and K+1 classifiers from annotation distributions:

>>> from ndfu import UnimodalLearner
>>> learner = UnimodalLearner(
...     train,
...     dev,
...     test,
...     feature_cols=["x1", "x2"],
...     scores_col="scores",
...     scale=range(1, 6),
...     threshold=0.0,
... )
>>> learner.fit_binary_baseline()
>>> learner.fit_unimodal_only_baseline()
>>> learner.fit_kplus_model()

Older examples that import from ndfu.src import * or from src import * are still supported for compatibility.

Development

Run the test suite with:

pytest

Contributing

Please cite this work as:

@inproceedings{pavlopoulos-likas-ndfu,
  title={Polarized Opinion Detection Improves the Detection of Toxic Language},
  author={Pavlopoulos, John and Likas, Aristidis},
  booktitle={Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics},
  year={2024}
}

Consider citing also the original article as:

@article{pavlopoulos-likas-2022,
    title = "Distance from Unimodality for the Assessment of Opinion Polarization",
    author = "Pavlopoulos, John  and Likas, Aristidis",
    journal = "Cognitive Computation",
    doi = "10.1007/s12559-022-10088-2",
    year = "2022",
}

Project details

These details have not been verified by PyPI

Project links

Operating System
- OS Independent
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Information Analysis

Release history Release notifications | RSS feed

0.9.3

May 17, 2026

This version

0.9.2

May 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ndfu-0.9.2.tar.gz (9.0 kB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ndfu-0.9.2-py3-none-any.whl (8.7 kB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file ndfu-0.9.2.tar.gz.

File metadata

Download URL: ndfu-0.9.2.tar.gz
Upload date: May 17, 2026
Size: 9.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`2d1581ea09573e5e815df03d923fb841deeb80a3b677ca1e7dea3ad613a1087b`
MD5	`847a8f0500fd0385c97d7296c7c246b8`
BLAKE2b-256	`d39eec60e6884a103e7899c800c1f1de5a1b4849511358bf283f00959db18991`

See more details on using hashes here.

File details

Details for the file ndfu-0.9.2-py3-none-any.whl.

File metadata

Download URL: ndfu-0.9.2-py3-none-any.whl
Upload date: May 17, 2026
Size: 8.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.10

File hashes

Hashes for ndfu-0.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e3be563b08ac2fe7d70eea9b0d69b6203db3bc0e1f2ce6ffaee83433b0ec6ad`
MD5	`008eb39da017b71adc54d718ddec3717`
BLAKE2b-256	`102e2a6728948f581cb96d5baa540ecb6be7e94ab64d76aab982b93510a84866`

See more details on using hashes here.

nDFU 0.9.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

The Normalized Distance from Unimodality (nDFU)

Notebooks

Installation

Usage

Development

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes