Computational Quality Control for Crowdsourcing
Project description
Crowd-Kit: Computational Quality Control for Crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
- metrics of uncertainty, consistency, and agreement with aggregate;
- loaders for popular crowdsourced datasets.
Also, the learning
subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.
Installing
To install Crowd-Kit, run the following command: pip install crowd-kit
. If you also want to use the learning
subpackage, type pip install crowd-kit[learning]
.
If you are interested in contributing to Crowd-Kit, use uv to manage the dependencies:
uv venv
uv pip install -e '.[dev,docs,learning]'
uv tool run pre-commit install
We use pytest for testing and a variety of linters, including pre-commit, Black, isort, Flake8, pyupgrade, and nbQA, to simplify code maintenance.
Getting Started
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task
, worker
, label
. Alternatively, you can download an example dataset:
df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then, you can aggregate the workers' responses using the fit_predict
method from the scikit-learn library:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Implemented Aggregation Methods
Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).
Categorical Responses
Method | Status |
---|---|
Majority Vote | ✅ |
One-coin Dawid-Skene | ✅ |
Dawid-Skene | ✅ |
Gold Majority Vote | ✅ |
M-MSR | ✅ |
Wawa | ✅ |
Zero-Based Skill | ✅ |
GLAD | ✅ |
KOS | ✅ |
MACE | ✅ |
Multi-Label Responses
Method | Status |
---|---|
Binary Relevance | ✅ |
Textual Responses
Method | Status |
---|---|
RASA | ✅ |
HRRASA | ✅ |
ROVER | ✅ |
Image Segmentation
Method | Status |
---|---|
Segmentation MV | ✅ |
Segmentation RASA | ✅ |
Segmentation EM | ✅ |
Pairwise Comparisons
Method | Status |
---|---|
Bradley-Terry | ✅ |
Noisy Bradley-Terry | ✅ |
Learning from Crowds
Method | Status |
---|---|
CrowdLayer | ✅ |
CoNAL | ✅ |
Citation
- Ustalov D., Pavlichenko N., Tseitlin B. (2024). Learning from Crowds with Crowd-Kit. Journal of Open Source Software, 9(96), 6227
@article{CrowdKit,
author = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
title = {{Learning from Crowds with Crowd-Kit}},
year = {2024},
journal = {Journal of Open Source Software},
volume = {9},
number = {96},
pages = {6227},
publisher = {The Open Journal},
doi = {10.21105/joss.06227},
issn = {2475-9066},
eprint = {2109.08584},
eprinttype = {arxiv},
eprintclass = {cs.HC},
language = {english},
}
Support and Contributions
Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.
License
© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file crowd_kit-1.4.1.tar.gz
.
File metadata
- Download URL: crowd_kit-1.4.1.tar.gz
- Upload date:
- Size: 62.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2b021e214cbcf9f43e40a68e7f6d35bc8e7c16d586bde2d0db3824071bff9a8a |
|
MD5 | 8fb14ced9fe593775f5dfa73fba7e126 |
|
BLAKE2b-256 | 89d8966dd8d96ede6efa7aa5c4153d9eef72f21fd74368993765466c10ae164e |
File details
Details for the file crowd_kit-1.4.1-py3-none-any.whl
.
File metadata
- Download URL: crowd_kit-1.4.1-py3-none-any.whl
- Upload date:
- Size: 89.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1dfbb1d8ed67198682038085f9cfbd1da98cbb04687354053c4aed14be6635cb |
|
MD5 | 274cce25592fe993da80ecf560c52b54 |
|
BLAKE2b-256 | 9d7cdc9255d15bced1b11636ed872a98bd968c453ed3de0e6d364bab6c12a083 |