An implementation of popular agreement coefficients used for categorical measurements.
Project description
Agreement
Inter-rater agreement
Agreement library provides an implementation of popular metrics used to measure inter-rater agreement. Inter-rater agreement (know also as a inter-rater reliability) is used to describe the degree of agreement among raters. It is a score of how much homogeneity or consensus exists in the ratings given by various judges.
If you want to learn more about this topic, you can start by reading this Wikipedia page.
Implemented metrics
This library provides a pure numpy implementation of an extended formulas for following metrics:
- Observed agreement
- Bennett et al.'s S score
- Cohen's kappa
- Gwet's gamma
- Krippendorff alpha
- Scott's pi
And extended formulas can be used to measure agreement for;
- multiple raters - support for two or more raters,
- multiple categories - support for binary problems, as well as more categories,
- missing ratings - not all raters provided answers for all the questions.
- weighted agreement - used to model distance between categories (e.g.
dist(5, 4) < dist(5, 1)
)
More information about implemented metrics can be found here: TODO
Implemented weights kernels
Agreement provides implementations for eight weight kernels:
- identity kernel
- linear kernel
- quadratic kernel
- ordinal_kernel
- radical_kernel
- radio_kernel
- circular_kernel
- bipolar_kernel
More information about implemented weights kernels can be found here: TODO
Installation
Agreement can be installed via pip from PyPI.
pip install agreement
Example usage
1. Prepare dataset
Let's assume you have a dataset in a format of a matrix with three columns: question id
, rater id
and answer
.
import numpy as np
dataset = np.array([
[1, 1, 'a'],
[1, 2, 'a'],
[1, 3, 'c'],
[2, 1, 'a'],
[2, 2, 'b'],
[2, 3, 'a'],
[3, 1, 'c'],
[3, 2, 'b'],
])
2. Transform dataset into matrices
In the next step we want to transform the dataset into matrices in a form accepted by the metrics functions.
Most of the matrices require a "questions answers" matrix, which contains a frequency of answers for each question.
So more formally we could say M = I x A
, where I
is a list of all items and A
is a list of all possible answers.
Matrix element M_ij
represents how many times answer j
was chosen for the questions i
.
The second matrix can be required (currently it is only required by the Cohen's kappa metrics) is "users answers" matrix, which
contains a frequency of answers selected by each user.
So more formally we could say M = U x A
, where U
is a list of all users and A
is a list of all possible answers.
Matrix element M_ij
represents how many times answer j
was chosen for the user i
.
The library provides a helper functions that can be used to prepare that.
from agreement.utils.transform import pivot_table_frequency
questions_answers_table = pivot_table_frequency(dataset[:, 0], dataset[:, 2])
users_answers_table = pivot_table_frequency(dataset[:, 1], dataset[:, 2])
3. Select kernel
Weights are used to model situations, where categories are represented as (at least) ordinal data. Using this approach, the agreement between raters is not binary, but it differs depending on the weights between chosen categories.
There is no formal rule that can be used for deciding which set weights should be used in a particular study, so it all depends on your problem and the data your are working with.
In a default, metrics are using the identity_kernel
, which do not provide any weighting between the answers.
If you want to use an alternative kernel, you can import it from:
from agreement.utils.kernels import linear_kernel
4. Compute the metric
The last step is to chose the metric you want to compute and run following code:
from agreement.metrics import cohens_kappa, krippendorffs_alpha
kappa = cohens_kappa(questions_answers_table, users_answers_table)
weighted_kappa = cohens_kappa(questions_answers_table, users_answers_table, weights_kernel=linear_kernel)
alpha = krippendorffs_alpha(questions_answers_table)
For more detailed example see: TODO
Reference
All equations are based on the Handbook of Inter-Rater ReLiability, Kilem Li. Gwet, 2014. This book provides an extensive explanation to all topics related to inter-rater agreement. The book provides a detailed description of all metrics implemented in this library, as well as an example datasets that were used to this this implementation.
I also recommend taking a look at MatLab implementation of the same metrics mReliability, which provides a more detailed explanation of metrics' formulas then the one you will find here.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file agreement-0.1.1.tar.gz
.
File metadata
- Download URL: agreement-0.1.1.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.2 tqdm/4.56.0 importlib-metadata/4.11.2 keyring/22.2.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae36a7a68b6d77a93d3982a3d22c079f97780a6a0462fa2d82c7faa29c9507ac |
|
MD5 | f4aa08518debb0e02f422521b580cd02 |
|
BLAKE2b-256 | 5f83b7ddc2673d3776eece328ffe73686013d03f4ceacccc6c3d506c7a7c7d33 |
File details
Details for the file agreement-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: agreement-0.1.1-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.25.1 requests-toolbelt/0.9.1 urllib3/1.26.2 tqdm/4.56.0 importlib-metadata/4.11.2 keyring/22.2.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1c50313a7456e92b3f09e92b310775fc3b4e6ed7259786effcdf1fca134f3a05 |
|
MD5 | 0df779110999de5fb9acf72027ceda89 |
|
BLAKE2b-256 | 8328c1728b4b28b13b058d0887d210e85382fc1b7808eba6d4870b7d174cca94 |