Skip to main content

Seamlessly build the MuMiN dataset.

Project description

MuMiN-Build

This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset (2021).

See the MuMiN website for more information, including a leaderboard of the top performing models.


PyPI Status Documentation License LastCommit Code Coverage

Installation

The mumin package can be installed using pip:

$ pip install mumin

To be able to build the dataset, Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.

Quickstart

The main class of the package is the MuminDataset class:

>>> from mumin import MuminDataset
>>> dataset = MuminDataset(twitter_bearer_token=XXXXX)
>>> dataset
MuminDataset(size='small', compiled=False)

By default, this loads the small version of the dataset. This can be changed by setting the size argument of MuminDataset to one of 'small', 'medium' or 'large'. To begin using the dataset, it first needs to be compiled. This will download the dataset, rehydrate the tweets and users, and download all the associated news articles, images and videos. This usually takes a while.

>>> dataset.compile()
MuminDataset(num_nodes=388,149, num_relations=475,490, size='small', compiled=True)

Note that this dataset does not contain all the nodes and relations in MuMiN-small, as that would take way longer to compile. The data left out are timelines, profile pictures and article images. These can be included by specifying include_extra_images=True and/or include_timelines=True in the constructor of MuminDataset.

After compilation, the dataset can also be found in the mumin-<size>.zip file. This file name can be changed using the dataset_path argument when initialising the MuminDataset class. If you need embeddings of the nodes, for instance for use in machine learning models, then you can simply call the add_embeddings method:

>>> dataset.add_embeddings()
MuminDataset(num_nodes=388,149, num_relations=475,490, size='small', compiled=True)

Note: If you need to use the add_embeddings method, you need to install the mumin package as either pip install mumin[embeddings] or pip install mumin[all], which will install the transformers and torch libraries. This is to ensure that such large libraries are only downloaded if needed.

It is possible to export the dataset to the Deep Graph Library, using the to_dgl method:

>>> dgl_graph = dataset.to_dgl()
>>> type(dgl_graph)
dgl.heterograph.DGLHeteroGraph

Note: If you need to use the to_dgl method, you need to install the mumin package as pip install mumin[dgl] or pip install mumin[all], which will install the dgl and torch libraries.

For a more in-depth tutorial of how to work with the dataset, including training multiple different misinformation classifiers, see the tutorial.

Dataset Statistics

Dataset #Claims #Threads #Tweets #Users #Articles #Images #Languages %Misinfo
MuMiN-large 12,914 26,048 21,565,018 1,986,354 10,920 6,573 41 94.57%
MuMiN-medium 5,565 10,832 12,650,371 1,150,259 4,212 2,510 37 94.07%
MuMiN-small 2,183 4,344 7,202,506 639,559 1,497 1,036 35 92.87%

Related Repositories

  • MuMiN website, the central place for the MuMiN ecosystem, containing tutorials, leaderboards and links to the paper and related repositories.
  • MuMiN, containing the paper in PDF and LaTeX form.
  • MuMiN-trawl, containing the source code used to construct the dataset from scratch.
  • MuMiN-baseline, containing the source code for the baselines.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mumin-1.10.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

mumin-1.10.0-py3-none-any.whl (31.4 kB view details)

Uploaded Python 3

File details

Details for the file mumin-1.10.0.tar.gz.

File metadata

  • Download URL: mumin-1.10.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.9 Darwin/21.5.0

File hashes

Hashes for mumin-1.10.0.tar.gz
Algorithm Hash digest
SHA256 ebd78fac0a62a390155a189e661bb10bf310ade2931049b0d889958605297d30
MD5 449cd5bbd20f7f0c5ed7575edd237beb
BLAKE2b-256 b4a48d3d816e65e36c2257fc50428e4fee695d1d6100d38a7dce8c321ef5e741

See more details on using hashes here.

File details

Details for the file mumin-1.10.0-py3-none-any.whl.

File metadata

  • Download URL: mumin-1.10.0-py3-none-any.whl
  • Upload date:
  • Size: 31.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.14 CPython/3.8.9 Darwin/21.5.0

File hashes

Hashes for mumin-1.10.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e06fe32cae31d3fa27d7cba52dee4602e4c1d824dc70eb708628385acd217ee
MD5 5699b9a511f1ba93d6303c98b205cb9a
BLAKE2b-256 a6b7392e5cae8066bc6c8c908d2d866e2ff78559394206db53dc401af5117941

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page