Skip to main content

No project description provided

Project description

MuMiN-Build

This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Dataset with Linked Social Network Posts (2021).

This is currently under review at NeurIPS 2021 Datasets and Benchmarks Track (Round 2). This dataset must not be used until this warning is removed as the dataset is subject to change, for example, during the review period.

Installation

The mumin package can be installed from source:

$ pip install git+https://github.com/CLARITI-REPHRAIN/mumin-build

To be able to build the dataset Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.

Quickstart

The main class of the package is the MuminDataset class:

>>> from mumin import MuminDataset
>>> dataset = MuminDataset(twitter_bearer_token=XXXXX)
>>> dataset
MuminDataset(size='large', compiled=False)

By default, this loads the large version of the dataset. This can be changed by setting the size argument to one of 'small', 'medium' or 'large'. To begin using the dataset, it first needs to be compiled. This will download the dataset, rehydrate the tweets and users, and download all the associated news articles, images and videos. This usually takes a while.

>>> dataset.compile()
MuminDataset(num_nodes=9,535,121, num_relations=15,232,212, size='large', compiled=True)

After compilation, the dataset can also be found in the ./mumin folder as separate csv files. This path can be changed using the dataset_dir argument when initialising the MuminDataset class. If you need embeddings of the nodes, for instance for use in machine learning models, then you can simply call the add_embeddings method:

>>> dataset.add_embeddings()
MuminDataset(num_nodes=9,535,121, num_relations=15,232,212, size='large', compiled=True)

Note: If you need to use the add_embeddings method, you need to install the mumin package as either pip install mumin[embeddings] or pip install mumin[all], which will install the transformers and torch libraries. This is to ensure that such large libraries are only downloaded if needed.

It is possible to export the dataset to the Deep Graph Library, using the to_dgl method:

>>> dgl_graph = dataset.to_dgl()
>>> type(dgl_graph)
dgl.heterograph.DGLHeteroGraph

Note: If you need to use the to_dgl method, you need to install the mumin package as pip install mumin[dgl] or pip install mumin[all], which will install the dgl and torch libraries.

Dataset Statistics

Size #Claims #Threads #Replies #Retweets #Users #Languages %Misinfo
Large 12,242 23,856 798,259 2,251,263 5,525,194 41 94.81%
Medium 5,244 9,863 427,472 1,299,096 2,894,456 37 94.34%
Small 2,079 4,018 258,455 811,078 1,611,344 35 93.20%

Related Repositories

  • MuMiN, containing the paper in PDF and LaTeX form.
  • MuMiN-trawl, containing the source code used to construct the dataset from scratch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mumin-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

mumin-0.1.0-py3-none-any.whl (22.0 kB view details)

Uploaded Python 3

File details

Details for the file mumin-0.1.0.tar.gz.

File metadata

  • Download URL: mumin-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3

File hashes

Hashes for mumin-0.1.0.tar.gz
Algorithm Hash digest
SHA256 47d43a1b3caf5ac3c0afb485864b1e0bf530a853b609dc81e41eab0799a8d139
MD5 f29dacad944337531b5972b0085a5c26
BLAKE2b-256 b441c5689b9111108d47c812aaecf72068ff976df4351b83fa49fb56fc2b9e40

See more details on using hashes here.

File details

Details for the file mumin-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mumin-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 22.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3

File hashes

Hashes for mumin-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 24a20af285f8ccac1ccf386aa8c3c589167a8b02d5b2eb9bbcba3fea453d4264
MD5 803480623fad041eeba75059ff7a6196
BLAKE2b-256 84e09c327becd96e6a7696ae2e9006e177203b6ae33335944471b95741698d9d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page