No project description provided
Project description
MuMiN-Build
This repository contains the package used to build the MuMiN dataset from the paper Nielsen and McConville: MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Dataset with Linked Social Network Posts (2021).
This is currently under review at NeurIPS 2021 Datasets and Benchmarks Track (Round 2). This dataset must not be used until this warning is removed as the dataset is subject to change, for example, during the review period.
Installation
The mumin
package can be installed using pip:
$ pip install mumin
To be able to build the dataset Twitter data needs to be downloaded, which requires a Twitter API key. You can get one for free here. You will need the Bearer Token.
Quickstart
The main class of the package is the MuminDataset
class:
>>> from mumin import MuminDataset
>>> dataset = MuminDataset(twitter_bearer_token=XXXXX)
>>> dataset
MuminDataset(size='large', compiled=False)
By default, this loads the large version of the dataset. This can be changed by
setting the size
argument to one of 'small', 'medium' or 'large'. To begin
using the dataset, it first needs to be compiled. This will download the
dataset, rehydrate the tweets and users, and download all the associated news
articles, images and videos. This usually takes a while.
>>> dataset.compile()
MuminDataset(num_nodes=XXXXX, num_relations=XXXXX, size='large', compiled=True)
After compilation, the dataset can also be found in the ./mumin
folder as
separate csv
files. This path can be changed using the dataset_dir
argument
when initialising the MuminDataset
class. If you need embeddings of the nodes, for instance for use in machine learning
models, then you can simply call the add_embeddings
method:
>>> dataset.add_embeddings()
MuminDataset(num_nodes=XXXXX, num_relations=XXXXX, size='large', compiled=True)
Note: If you need to use the add_embeddings
method, you need to install
the mumin
package as either pip install mumin[embeddings]
or pip install mumin[all]
, which will install the transformers
and torch
libraries. This
is to ensure that such large libraries are only downloaded if needed.
It is possible to export the dataset to the
Deep Graph Library, using the to_dgl
method:
>>> dgl_graph = dataset.to_dgl()
>>> type(dgl_graph)
dgl.heterograph.DGLHeteroGraph
Note: If you need to use the to_dgl
method, you need to install the
mumin
package as pip install mumin[dgl]
or pip install mumin[all]
, which
will install the dgl
and torch
libraries.
Dataset Statistics
Size | #Claims | #Threads | #Replies | #Retweets | #Users | #Languages | %Misinfo |
---|---|---|---|---|---|---|---|
Large | 12,347 | 24,773 | 1,024,070 | 695,924 | 4,306,272 | 41 | 94.57% |
Medium | 5,265 | 10,195 | 480,249 | 305,300 | 2,004,300 | 37 | 94.07% |
Small | 2,089 | 4,126 | 220,862 | 132,561 | 916,697 | 35 | 92.87% |
Related Repositories
- MuMiN, containing the paper in PDF and LaTeX form.
- MuMiN-trawl, containing the source code used to construct the dataset from scratch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file mumin-0.1.3.tar.gz
.
File metadata
- Download URL: mumin-0.1.3.tar.gz
- Upload date:
- Size: 22.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c01847beb1a2b2671c178e62d34969fb453eb8ce4457bf606e2b81b9f131eef |
|
MD5 | 07c8000151a9187e6ee7b1b1d777af75 |
|
BLAKE2b-256 | a7013b16d661dcedcc47a842eeb6415f0c62aa11b9da9e6a60cd6b36c3061e84 |
File details
Details for the file mumin-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: mumin-0.1.3-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d58f426aaff6192e6d79b810256e8ab1675702e7c3df4cd3de72036bcef3512 |
|
MD5 | 8fd49670569920bc1c1e99596a20aae8 |
|
BLAKE2b-256 | 670d85c49229a24edb97af0861efe4f093004e5b1ab4dcd716f6873bdbc54eaa |