Computational Quality Control for Crowdsourcing
Project description
Crowd-Kit: Computational Quality Control for Crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
- metrics of uncertainty, consistency, and agreement with aggregate;
- loaders for popular crowdsourced datasets.
Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.
Installing
To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].
If you are interested in contributing to Crowd-Kit, use uv to manage the dependencies:
uv venv
uv pip install -e '.[dev,docs,learning]'
uv tool run pre-commit install
We use pytest for testing and a variety of linters, including pre-commit, Black, isort, Flake8, pyupgrade, and nbQA, to simplify code maintenance.
Getting Started
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:
df = pd.read_csv('results.csv') # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Implemented Aggregation Methods
Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).
Categorical Responses
| Method | Status |
|---|---|
| Majority Vote | ✅ |
| One-coin Dawid-Skene | ✅ |
| Dawid-Skene | ✅ |
| Gold Majority Vote | ✅ |
| M-MSR | ✅ |
| Wawa | ✅ |
| Zero-Based Skill | ✅ |
| GLAD | ✅ |
| KOS | ✅ |
| MACE | ✅ |
Multi-Label Responses
| Method | Status |
|---|---|
| Binary Relevance | ✅ |
Textual Responses
| Method | Status |
|---|---|
| RASA | ✅ |
| HRRASA | ✅ |
| ROVER | ✅ |
Image Segmentation
| Method | Status |
|---|---|
| Segmentation MV | ✅ |
| Segmentation RASA | ✅ |
| Segmentation EM | ✅ |
Pairwise Comparisons
| Method | Status |
|---|---|
| Bradley-Terry | ✅ |
| Noisy Bradley-Terry | ✅ |
[!TIP] Consider using the more modern Evalica library to aggregate pairwise comparisons.
Learning from Crowds
| Method | Status |
|---|---|
| CrowdLayer | ✅ |
| CoNAL | ✅ |
Citation
- Ustalov D., Pavlichenko N., Tseitlin B. (2024). Learning from Crowds with Crowd-Kit. Journal of Open Source Software, 9(96), 6227
@article{CrowdKit,
author = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
title = {{Learning from Crowds with Crowd-Kit}},
year = {2024},
journal = {Journal of Open Source Software},
volume = {9},
number = {96},
pages = {6227},
publisher = {The Open Journal},
doi = {10.21105/joss.06227},
issn = {2475-9066},
eprint = {2109.08584},
eprinttype = {arxiv},
eprintclass = {cs.HC},
language = {english},
}
Support and Contributions
Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.
License
© Crowd-Kit team authors, 2020–2025. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crowd_kit-1.4.2.tar.gz.
File metadata
- Download URL: crowd_kit-1.4.2.tar.gz
- Upload date:
- Size: 62.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bde264bdd9a313664eb4dcf09a6fa688308d8c2478b95198af90eaa5fd9e0b93
|
|
| MD5 |
2cd3722133cca85f2a6a999133ee540b
|
|
| BLAKE2b-256 |
2638917e478455a3d611ac8f8f4fae36fb7c0b65722cce23d3a1bef01d82edfe
|
Provenance
The following attestation bundles were made for crowd_kit-1.4.2.tar.gz:
Publisher:
release.yml on Toloka/crowd-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crowd_kit-1.4.2.tar.gz -
Subject digest:
bde264bdd9a313664eb4dcf09a6fa688308d8c2478b95198af90eaa5fd9e0b93 - Sigstore transparency entry: 602080017
- Sigstore integration time:
-
Permalink:
Toloka/crowd-kit@cad794bb64686fdd9868ce0ab1282ef61b639c7f -
Branch / Tag:
refs/tags/v1.4.2 - Owner: https://github.com/Toloka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cad794bb64686fdd9868ce0ab1282ef61b639c7f -
Trigger Event:
release
-
Statement type:
File details
Details for the file crowd_kit-1.4.2-py3-none-any.whl.
File metadata
- Download URL: crowd_kit-1.4.2-py3-none-any.whl
- Upload date:
- Size: 89.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f349ad8b06bf56418d5188bea1371b84186aa5b6587908754caee48d40c1b9fb
|
|
| MD5 |
b4dc24fbfd6adadac47d1b55139dd45a
|
|
| BLAKE2b-256 |
28afd376b34d2d0c6ef9a31dba1e92960bdd4035a91a90f5f8cc95135f1273dd
|
Provenance
The following attestation bundles were made for crowd_kit-1.4.2-py3-none-any.whl:
Publisher:
release.yml on Toloka/crowd-kit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crowd_kit-1.4.2-py3-none-any.whl -
Subject digest:
f349ad8b06bf56418d5188bea1371b84186aa5b6587908754caee48d40c1b9fb - Sigstore transparency entry: 602080025
- Sigstore integration time:
-
Permalink:
Toloka/crowd-kit@cad794bb64686fdd9868ce0ab1282ef61b639c7f -
Branch / Tag:
refs/tags/v1.4.2 - Owner: https://github.com/Toloka
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@cad794bb64686fdd9868ce0ab1282ef61b639c7f -
Trigger Event:
release
-
Statement type: