Skip to main content

Computational Quality Control for Crowdsourcing

Project description

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

PyPI Version GitHub Tests Codecov Documentation Paper

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method Status
Majority Vote
One-coin Dawid-Skene
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
KOS
MACE

Multi-Label Responses

Method Status
Binary Relevance

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

Learning from Crowds

Method Status
CrowdLayer
CoNAL

Citation

@article{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2024},
  journal   = {Journal of Open Source Software},
  volume    = {9},
  number    = {96},
  pages     = {6227},
  publisher = {The Open Journal},
  doi       = {10.21105/joss.06227},
  issn      = {2475-9066},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  language  = {english},
}

Support and Contributions

Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.

License

© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowd-kit-1.3.0.post0.tar.gz (60.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowd_kit-1.3.0.post0-py3-none-any.whl (87.3 kB view details)

Uploaded Python 3

File details

Details for the file crowd-kit-1.3.0.post0.tar.gz.

File metadata

  • Download URL: crowd-kit-1.3.0.post0.tar.gz
  • Upload date:
  • Size: 60.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for crowd-kit-1.3.0.post0.tar.gz
Algorithm Hash digest
SHA256 ba4932935e29d7739ca469e1c9e69bb36cd6edaba33620412e9ed4ba46b3eff2
MD5 530c80544ad901c78e306bb987d25632
BLAKE2b-256 93bfe36fef1556e7612f32d14bbf6ef8bcab10420555f9f149b45e2007b07566

See more details on using hashes here.

File details

Details for the file crowd_kit-1.3.0.post0-py3-none-any.whl.

File metadata

File hashes

Hashes for crowd_kit-1.3.0.post0-py3-none-any.whl
Algorithm Hash digest
SHA256 e74e90d926503c7396492c24b34ed47b3451bc3e393eefac87867069fd6ef9d7
MD5 6d74719407cdbd5b9580b303466dd6f5
BLAKE2b-256 86de64e7fbf4cc8f7188b0fc275ed2a9bea7fb92e54ec31b83b009173836999d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page