Skip to main content

Computational Quality Control for Crowdsourcing

Project description

Crowd-Kit: Computational Quality Control for Crowdsourcing

Crowd-Kit

PyPI Version GitHub Tests Codecov Documentation

Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.

Currently, Crowd-Kit contains:

  • implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses;
  • metrics of uncertainty, consistency, and agreement with aggregate;
  • loaders for popular crowdsourced datasets.

Also, the learning subpackage contains PyTorch implementations of deep learning from crowds methods and advanced aggregation algorithms.

Installing

To install Crowd-Kit, run the following command: pip install crowd-kit. If you also want to use the learning subpackage, type pip install crowd-kit[learning].

If you are interested in contributing to Crowd-Kit, use Pipenv to install the library with its dependencies: pipenv install --dev. We use pytest for testing.

Getting Started

This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.

First, let us do all the necessary imports.

from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset

import pandas as pd

Then, you need to read your annotations into Pandas DataFrame with columns task, worker, label. Alternatively, you can download an example dataset:

df = pd.read_csv('results.csv')  # should contain columns: task, worker, label
# df, ground_truth = load_dataset('relevance-2')  # or download an example dataset

Then, you can aggregate the workers' responses using the fit_predict method from the scikit-learn library:

aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)

More usage examples

Implemented Aggregation Methods

Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).

Categorical Responses

Method Status
Majority Vote
One-coin Dawid-Skene
Dawid-Skene
Gold Majority Vote
M-MSR
Wawa
Zero-Based Skill
GLAD
KOS
MACE
BCC 🟡

Multi-Label Responses

Method Status
Binary Relevance

Textual Responses

Method Status
RASA
HRRASA
ROVER

Image Segmentation

Method Status
Segmentation MV
Segmentation RASA
Segmentation EM

Pairwise Comparisons

Method Status
Bradley-Terry
Noisy Bradley-Terry

Learning from Crowds

Method Status
CrowdLayer
CoNAL

Citation

@misc{CrowdKit,
  author    = {Ustalov, Dmitry and Pavlichenko, Nikita and Tseitlin, Boris},
  title     = {{Learning from Crowds with Crowd-Kit}},
  year      = {2023},
  publisher = {arXiv},
  eprint    = {2109.08584},
  eprinttype = {arxiv},
  eprintclass = {cs.HC},
  url       = {https://arxiv.org/abs/2109.08584},
  language  = {english},
}

Support and Contributions

Please use GitHub Issues to seek support and submit feature requests. We accept contributions to Crowd-Kit via GitHub as according to our guidelines in CONTRIBUTING.md.

License

© Crowd-Kit team authors, 2020–2024. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crowd-kit-1.3.0rc1.tar.gz (59.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crowd_kit-1.3.0rc1-py3-none-any.whl (87.1 kB view details)

Uploaded Python 3

File details

Details for the file crowd-kit-1.3.0rc1.tar.gz.

File metadata

  • Download URL: crowd-kit-1.3.0rc1.tar.gz
  • Upload date:
  • Size: 59.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for crowd-kit-1.3.0rc1.tar.gz
Algorithm Hash digest
SHA256 bd1407892855c066296d957e3026e924a29c9d9f44fd2d33e9b0680c5fde1068
MD5 69f8b063fc291be638a793dae3aa301f
BLAKE2b-256 cc864bf2cf802ae8d34b6c5e236080b291b2103be0263538e2e3358eee3c1436

See more details on using hashes here.

File details

Details for the file crowd_kit-1.3.0rc1-py3-none-any.whl.

File metadata

  • Download URL: crowd_kit-1.3.0rc1-py3-none-any.whl
  • Upload date:
  • Size: 87.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for crowd_kit-1.3.0rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 aec7ef02278d2c629ef1437d6a744a96f38b4f77bed7bdfef1681fc3796125bb
MD5 27404d9ad8605ea1a8d45262c5806da4
BLAKE2b-256 bfe0736034f58fead3a8f040902d191dbb2ad23203fe0af21bb499b9ee8a3b2c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page