Python libraries for crowdsourcing
Project description
Crowd-Kit: Computational Quality Control for Crowdsourcing
Crowd-Kit is a powerful Python library that implements commonly-used aggregation methods for crowdsourced annotation and offers the relevant metrics and datasets. We strive to implement functionality that simplifies working with crowdsourced data.
Currently, Crowd-Kit contains:
- implementations of commonly-used aggregation methods for categorical, pairwise, textual, and segmentation responses
- metrics of uncertainty, consistency, and agreement with aggregate
- loaders for popular crowdsourced datasets
The library is currently in a heavy development state, and interfaces are subject to change.
Installing
Installing Crowd-Kit is as easy as pip install crowd-kit
Getting Started
This example shows how to use Crowd-Kit for categorical aggregation using the classical Dawid-Skene algorithm.
First, let us do all the necessary imports.
from crowdkit.aggregation import DawidSkene
from crowdkit.datasets import load_dataset
import pandas as pd
Then, you need to read your annotations into Pandas DataFrame with columns task, performer, label. Alternatively, you can download an example dataset.
df = pd.read_csv('results.csv') # should contain columns: task, performer, label
# df, ground_truth = load_dataset('relevance-2') # or download an example dataset
Then you can aggregate the performer responses as easily as in scikit-learn:
aggregated_labels = DawidSkene(n_iter=100).fit_predict(df)
Implemented Aggregation Methods
Below is the list of currently implemented methods, including the already available (✅) and in progress (🟡).
Categorical Responses
| Method | Status |
|---|---|
| Majority Vote | ✅ |
| Dawid-Skene | ✅ |
| Gold Majority Vote | ✅ |
| M-MSR | ✅ |
| Wawa | ✅ |
| Zero-Based Skill | ✅ |
| GLAD | ✅ |
| BCC | 🟡 |
Textual Responses
| Method | Status |
|---|---|
| RASA | ✅ |
| HRRASA | ✅ |
| ROVER | ✅ |
Image Segmentation
| Method | Status |
|---|---|
| Segmentation MV | ✅ |
| Segmentation RASA | ✅ |
| Segmentation EM | ✅ |
Pairwise Comparisons
| Method | Status |
|---|---|
| Bradley-Terry | ✅ |
| Noisy Bradley-Terry | ✅ |
Questions and Bug Reports
- For reporting bugs please use the Toloka/bugreport page.
- Join our English-speaking slack community for both tech and abstract questions.
License
© YANDEX LLC, 2020-2021. Licensed under the Apache License, Version 2.0. See LICENSE file for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crowd-kit-0.0.8.tar.gz.
File metadata
- Download URL: crowd-kit-0.0.8.tar.gz
- Upload date:
- Size: 37.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0ae15ccf4eeecfbc5924ea40a8606a44f9cc35fd9cf4fb60d649b3d720c933ca
|
|
| MD5 |
5706d54aa51b71404a065c6731c88a0c
|
|
| BLAKE2b-256 |
c3e9981c10a26066f8be4fc108c83812313d844aafb620e8f48d9b9c432d5be2
|
File details
Details for the file crowd_kit-0.0.8-py3-none-any.whl.
File metadata
- Download URL: crowd_kit-0.0.8-py3-none-any.whl
- Upload date:
- Size: 63.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.8.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27a26e7d7c0d32fde0ea2866da8bd0ffe52b0c22edd2899190511851535830cc
|
|
| MD5 |
0d94ebb7fdc85b21c91d6109e5635461
|
|
| BLAKE2b-256 |
61928df6fe09aec28174694b9334721a8897bb4aeecc05a9237a696f947b104b
|