Skip to main content

Computational Quality Control for Crowdsourcing

Project description

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

PyPI Version GitHub Tests Codecov Documentation Paper

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use uv to manage the dependencies:

uv venv
uv pip install -e '.[dev,docs,learning]'
uv tool run pre-commit install

We use pytest for testing and a variety of linters, including pre-commit, Black, isort, Flake8, pyupgrade, and nbQA, to simplify code maintenance.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method Status
Majority Vote
One-coin Dawid-Skene
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
KOS
MACE

Multi-Label Responses

Method Status
Binary Relevance

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

[!TIP] Consider using the more modern Evalica library to aggregate pairwise comparisons.

Learning from Crowds

Method Status
CrowdLayer
CoNAL

Citation

@article{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2024},
  journal   = {Journal of Open Source Software},
  volume    = {9},
  number    = {96},
  pages     = {6227},
  publisher = {The Open Journal},
  doi       = {10.21105/joss.06227},
  issn      = {2475-9066},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}

Support and Contributions

Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.

License

© Crowd-Kit team authors, 2020–2025. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowd_kit-1.4.2.tar.gz (62.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowd_kit-1.4.2-py3-none-any.whl (89.4 kB view details)

Uploaded Python 3

File details

Details for the file crowd_kit-1.4.2.tar.gz.

File metadata

  • Download URL: crowd_kit-1.4.2.tar.gz
  • Upload date:
  • Size: 62.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crowd_kit-1.4.2.tar.gz
Algorithm Hash digest
SHA256 bde264bdd9a313664eb4dcf09a6fa688308d8c2478b95198af90eaa5fd9e0b93
MD5 2cd3722133cca85f2a6a999133ee540b
BLAKE2b-256 2638917e478455a3d611ac8f8f4fae36fb7c0b65722cce23d3a1bef01d82edfe

See more details on using hashes here.

Provenance

The following attestation bundles were made for crowd_kit-1.4.2.tar.gz:

Publisher: release.yml on Toloka/crowd-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crowd_kit-1.4.2-py3-none-any.whl.

File metadata

  • Download URL: crowd_kit-1.4.2-py3-none-any.whl
  • Upload date:
  • Size: 89.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for crowd_kit-1.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f349ad8b06bf56418d5188bea1371b84186aa5b6587908754caee48d40c1b9fb
MD5 b4dc24fbfd6adadac47d1b55139dd45a
BLAKE2b-256 28afd376b34d2d0c6ef9a31dba1e92960bdd4035a91a90f5f8cc95135f1273dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for crowd_kit-1.4.2-py3-none-any.whl:

Publisher: release.yml on Toloka/crowd-kit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page