Skip to main content

An implementation of popular agreement coefficients used for categorical measurements.

Project description

Agreement Build Status

Inter-rater agreement

Agreement library provides an implementation of popular metrics used to measure inter-rater agreement. Inter-rater agreement (know also as a inter-rater reliability) is used to describe the degree of agreement among raters. It is a score of how much homogeneity or consensus exists in the ratings given by various judges.

If you want to learn more about this topic, you can start by reading this Wikipedia page.

Implemented metrics

This library provides a pure numpy implementation of an extended formulas for following metrics:

  • Observed agreement
  • Bennett et al.'s S score
  • Cohen's kappa
  • Gwet's gamma
  • Krippendorff alpha
  • Scott's pi

And extended formulas can be used to measure agreement for;

  • multiple raters - support for two or more raters,
  • multiple categories - support for binary problems, as well as more categories,
  • missing ratings - not all raters provided answers for all the questions.
  • weighted agreement - used to model distance between categories (e.g. dist(5, 4) < dist(5, 1))

More information about implemented metrics can be found here: TODO

Implemented weights kernels

Agreement provides implementations for eight weight kernels:

  • identity kernel
  • linear kernel
  • quadratic kernel
  • ordinal_kernel
  • radical_kernel
  • radio_kernel
  • circular_kernel
  • bipolar_kernel

More information about implemented weights kernels can be found here: TODO

Installation

Agreement can be installed via pip from PyPI.

pip install agreement

Example usage

1. Prepare dataset

Let's assume you have a dataset in a format of a matrix with three columns: question id, rater id and answer.

import numpy  as np

dataset = np.array([
    [1, 1, 'a'],
    [1, 2, 'a'],
    [1, 3, 'c'],
    [2, 1, 'a'],
    [2, 2, 'b'],
    [2, 3, 'a'],
    [3, 1, 'c'],
    [3, 2, 'b'],
])

2. Transform dataset into matrices

In the next step we want to transform the dataset into matrices in a form accepted by the metrics functions.

Most of the matrices require a "questions answers" matrix, which contains a frequency of answers for each question. So more formally we could say M = I x A, where I is a list of all items and A is a list of all possible answers. Matrix element M_ij represents how many times answer j was chosen for the questions i.

The second matrix can be required (currently it is only required by the Cohen's kappa metrics) is "users answers" matrix, which contains a frequency of answers selected by each user. So more formally we could say M = U x A, where U is a list of all users and A is a list of all possible answers. Matrix element M_ij represents how many times answer j was chosen for the user i.

The library provides a helper functions that can be used to prepare that.

from agreement.utils.transform import pivot_table_frequency

questions_answers_table = pivot_table_frequency(dataset[:, 0], dataset[:, 2])
users_answers_table = pivot_table_frequency(dataset[:, 1], dataset[:, 2])

3. Select kernel

Weights are used to model situations, where categories are represented as (at least) ordinal data. Using this approach, the agreement between raters is not binary, but it differs depending on the weights between chosen categories.

There is no formal rule that can be used for deciding which set weights should be used in a particular study, so it all depends on your problem and the data your are working with.

In a default, metrics are using the identity_kernel, which do not provide any weighting between the answers. If you want to use an alternative kernel, you can import it from:

from agreement.utils.kernels import linear_kernel

4. Compute the metric

The last step is to chose the metric you want to compute and run following code:

from agreement.metrics import cohens_kappa, krippendorffs_alpha

kappa = cohens_kappa(questions_answers_table, users_answers_table)
weighted_kappa = cohens_kappa(questions_answers_table, users_answers_table, weights_kernel=linear_kernel)

alpha = krippendorffs_alpha(questions_answers_table)

For more detailed example see: TODO

Reference

All equations are based on the Handbook of Inter-Rater ReLiability, Kilem Li. Gwet, 2014. This book provides an extensive explanation to all topics related to inter-rater agreement. The book provides a detailed description of all metrics implemented in this library, as well as an example datasets that were used to this this implementation.

I also recommend taking a look at MatLab implementation of the same metrics mReliability, which provides a more detailed explanation of metrics' formulas then the one you will find here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agreement-0.1.1.tar.gz (19.8 kB view hashes)

Uploaded Source

Built Distribution

agreement-0.1.1-py3-none-any.whl (19.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page