Skip to main content

No project description provided

Project description

MuMiN-Build

This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Dataset with Linked Social Network Posts (2021).

Installation

The mumin package can be installed using pip:

$ pip install mumin

To be able to build the dataset, Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.

Quickstart

The main class of the package is the MuminDataset class:

>>> from mumin import MuminDataset
>>> dataset = MuminDataset(twitter_bearer_token=XXXXX)
>>> dataset
MuminDataset(size='small', compiled=False)

By default, this loads the small version of the dataset. This can be changed by setting the size argument of MuminDataset to one of 'small', 'medium' or 'large'. To begin using the dataset, it first needs to be compiled. This will download the dataset, rehydrate the tweets and users, and download all the associated news articles, images and videos. This usually takes a while.

>>> dataset.compile()
MuminDataset(num_nodes=XXXXX, num_relations=XXXXX, size='small', compiled=True)

After compilation, the dataset can also be found in the mumin-<size>.zip file. This file name can be changed using the dataset_path argument when initialising the MuminDataset class. If you need embeddings of the nodes, for instance for use in machine learning models, then you can simply call the add_embeddings method:

>>> dataset.add_embeddings()
MuminDataset(num_nodes=XXXXX, num_relations=XXXXX, size='small', compiled=True)

Note: If you need to use the add_embeddings method, you need to install the mumin package as either pip install mumin[embeddings] or pip install mumin[all], which will install the transformers and torch libraries. This is to ensure that such large libraries are only downloaded if needed.

It is possible to export the dataset to the Deep Graph Library, using the to_dgl method:

>>> dgl_graph = dataset.to_dgl()
>>> type(dgl_graph)
dgl.heterograph.DGLHeteroGraph

Note: If you need to use the to_dgl method, you need to install the mumin package as pip install mumin[dgl] or pip install mumin[all], which will install the dgl and torch libraries.

Dataset Statistics

Size #Claims #Threads #Replies #Retweets #Users #Languages %Misinfo
Large 12,347 24,773 1,024,070 695,924 4,306,272 41 94.57%
Medium 5,265 10,195 480,249 305,300 2,004,300 37 94.07%
Small 2,089 4,126 220,862 132,561 916,697 35 92.87%

Related Repositories

  • MuMiN, containing the paper in PDF and LaTeX form.
  • MuMiN-trawl, containing the source code used to construct the dataset from scratch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mumin-1.0.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

mumin-1.0.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file mumin-1.0.0.tar.gz.

File metadata

  • Download URL: mumin-1.0.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3

File hashes

Hashes for mumin-1.0.0.tar.gz
Algorithm Hash digest
SHA256 38d2d1e7ff78c511ea41fc880399939a7bc850b1e7f7cbe4decf943f012bd052
MD5 9817df37f7e3831a6e00884bd9f1e229
BLAKE2b-256 0e235ca10d9b6f082721ae493a58bbeed29a94ac2735b8b43e524fd9c7a8f9b4

See more details on using hashes here.

File details

Details for the file mumin-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mumin-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3

File hashes

Hashes for mumin-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4d182f21f3601833e1a7771836edd3720ccf051e91df7ff05ae7537eac5c68d
MD5 ea020c9ec2c7137c1f22e47d161caeea
BLAKE2b-256 57a03aa0d76d355ec57f078463afc714b93a43b8d8025c066340c701cbe319f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page