Library to help analyze crowdsourcing results
Project description
crowdnalysis
Crowdsourcing Citizen Science projects usually require citizens to classify items (images, pdfs, songs,…) into one of a finite set of categories. Once an image is classified by different citizens, the different votes need to be aggregated to obtain a consensus classification. Usually this is done by selecting the most voted category. crowdnalysis allows Crowdsourcing Citizen Science projects to compute consensus that go beyond the selection of the most voted category, by computing a model of quality for each of the citizen scientist involved in the project. This more advanced consensus results in higher quality information for the Crowdsourcing Citizen Science project.
Implemented consensus algorithms
- Majority Voting
- Probabilistic
- Multinomial
- Dawid-Skene
In addition to the pure Python implementations above, the following models are implemented in the
probabilistic programming language Stan by using the
CmdStanPy
interface:
- Multinomial
- Multinomial Eta
- Dawid-Skene
- Dawid-Skene Eta Hierarchical
~ Eta models impose that the probability of the labels are higher for the real classes in the error-rate (a.k.a. confusion) matrix.
Features
- Import annotation data from a
CSV
file with a preprocessing option - Calculate inter-rater reliability with different measures
- Fit selected model to annotation data and compute the consensus
- Compute the consensus with a fixed pre-determined set of parameters
- Fit the model parameters provided that the consensus is already known
- Given the parameters of a generative model (Multinomial, Dawid-Skene), sample annotations, tasks, and workers (i.e., annotators)
- Visualize the error-rate matrix for annotators
- Conduct predictive analysis of the accuracy vs. number of annotations for a given set of models
- Visualize the consensus on annotated images in
HTML
format
Quick start
crowdnalysis is distributed via PyPI: https://pypi.org/project/crowdnalysis/
Install as a standard Python package:
$ pip install crowdnalysis
CmdStanPy
will be installed as a dependency, however, this package requires the installation of the
CmdStan
command-line interface too.
This can be done via executing the install_cmdstan
utility that comes with CmdStanPy
.
See the package docs for more information.
$ install_cmdstan
Use the package in the code:
import crowdnalysis
Check available consensus models:
print(crowdnalysis.factory.Factory.list_registered_algorithms())
How to run unit tests
We use pytest as the testing framework. Tests can be run by:
$ pytest
If you want to get the logs of the execution, do
$ pytest --log-cli-level 0
Logging
We use the standard logging
library according to the rules here.
License
This project is licensed under the GNU Affero General Public License v3.0 - see the LICENSE file for details.
Acknowledgements
crowdnalysis is being developed within the Crowd4SDG project funded by the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 872944.Reference
For the details of the conceptual and mathematical model of crowdnalysis, see:
[1] Cerquides, J.; Mülâyim, M.O.; Hernández-González, J.; Ravi Shankar, A.; Fernandez-Marquez, J.L. A Conceptual Probabilistic Framework for Annotation Aggregation of Citizen Science Data. Mathematics 2021, 9, 875, https://doi.org/10.3390/math9080875
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for crowdnalysis-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c1085876e1aae20e174290af6ccd19bb12640065ebab12f87642bd544f1eb45c |
|
MD5 | 23af03780be0b09927f2cbdac8906900 |
|
BLAKE2b-256 | 52a5f276bbacfd1a7c1a2152a5d46d2c28e772fc3ae7fcd1ef4e6eaf22d20727 |