Skip to main content

Calculate misinformation-exposure scores for users based on the falsity scores of public figures they follow on Twitter.

Project description

py_misinfo_exposure

A Python package that can be used to calculate misinformation-exposure scores for a user based on the falsity scores of public figures they follow on Twitter.

The falsity score is based on PolitiFact fact-checks of the public figures.

🚨 Notes 🚨:

  1. This package replicates Mohsen Mosleh's R package which does the same thing and is based on Mosleh and Rand's paper (2021).
  2. This package requires you have a Twitter developer account _with access to Twitter's V2 API

Contents

Installation

This package has been uploaded to the PyPi index so it can be installed via the command line via...

pip install py_misinfo_exposure

Quick start

from py_misinfo_exposure import PyMisinfoExposure

# Set your personal Twitter bearer token
bearer = "YOUR TWITTER BEARER TOKEN"

# Initialize the PyMisinfoExposure class with your bearer token
pme = PyMisinfoExposure(bearer_token=bearer)

# Under the hood, py_misinfo_exposure utilizes Tweepy to access Twitter data
# This function authorizes your access to Twitter with the earlier provided bearer token
pme.tweepy_bearer_authorization()

# Create a list of unique Twitter user IDs that you would like misinformation exposure scores for
user_test_list = ["1312850357555539972", "1260526934678740993"]

# Get misinformation exposure scores
misinfo_scores, missing_users = pme.get_misinfo_exposure_score(user_test_list)

# Where `misinfo_scores` is the below pandas.DataFrame

                  user  misinfo_score
0  1260526934678740993            NaN # NaN means this user does not follow any of the tracked political elites
1  1312850357555539972       0.675167

Note that pme.get_misinfo_exposure_score returns a tuple.

In the tuple above, misinfo_scores represents a pandas dataframe object and missing_users will return a set of users for whom no friends were found. This may happen, for example, if the account has been suspended or it does not exist. If there are no missing users, missing_users is returned as None.

Understanding the package and more control

How the package works

The package works by taking the list of user IDs that you provide and then asking Twitter to provide all of their friends on Twitter (the people that they follow). After this has been done, the mean "falsity" score is taken from all of the friends that a user follows that are present within the PolitiFact data.

Rate limits

py_misinfo_exposure uses the tweepy package under the hood to gather Twitter data and, with the Twitter bearer token that you provide, initializes a tweepy client that will automatically wait the proper amount of time when Twitter rate limits have been hit.

Calculating scores for a large list of users

The default way that py_misinfo_exposure works is to download all of the friends data from Twitter and hold it in your machine's working memory. This becomes problematic when calculating scores for a large list of users because your machine may crash from holding too much data at once.

To solve this problem you can simply set save_friends_to_disk=True when you initialize the PyMisinfoExposure class like so:

pme = PyMisinfoExposure(
    bearer_token=bearer,
    save_friends_to_disk=True   # <---------- Add this to save friends data to your machine
    )

Then, when you call pme.get_misinfo_exposure_score(users), friends data will be downloaded into a folder within your current working directory. By default, this folder will be called py_misinfo_friend_data, however, you can again manually control the name of this folder by setting the output_dir parameter when you initialize the PyMisinfoExposure class in the following way.

pme = PyMisinfoExposure(
    bearer_token=bearer,
    save_friends_to_disk=True,      # <---------- Add this to save friends data to your machine
    output_dir='myoutputdirectory'  # <---------- Add this to save friends data into the 'myoutputdirectory' folder
    )

Verbosity

If you would like misinformation exposure scores for a large set of users, it may take some time to retrieve all of the friends for all of the users you are interested in.

Note: How long it will take is explicitly determined by Twitter's API rate limits. For more information, you can see Twitter's API documentation for the endpoint utilized by py_misinfo_exposure.

TLDR: You can retrieve up to 15,000 friends every 15 minutes. In reality, the number of friends you can retrieve from Twitter in 15 minutes will likely be less because rate limits are based on the number of API calls made to Twitter and not the number of friends returned.

To print updates for a long-running script, you can utilize the other PyMisinfoExposure arguments: verbose and update_on.

For example, if you want the PyMisinfoExposure class to let you know every time another 500 users have been processed, you can initialize the class in the following way:

pme = PyMisinfoExposure(
    bearer_token=bearer,
    verbose=True,
    update_on=500 # default value = 100
    )

Example script

This repository also includes an example script called get_users_misinfo_exposure_scores.py that takes in a file which contains one Twitter user ID on each line and returns a CSV file containing all of those users misinformation-exposure scores. I suggest first executing the below line of code from your command line...

python3 get_users_misinfo_exposure_scores.py -h

...which will display what the script does and all of the command line flags that are available.

For a quick start, it can be run in the following way...

python3 get_users_misinfo_exposure_scores.py --input_file py_misinfo_exposure/data/randomusers.txt --output_file 'my_output_filename' --bearer_token $TWITTER_BEARER_TOKEN

... where $TWITTER_BEARER_TOKEN should be replaced with your Twitter developer bearer token.

Note: The parameters set inside of this script for PyMisinfoExposure will likely need to be updated for more practical use. For example, this script provides updates after every 2 users, which is quite fast (to provide feedback for testing quickly).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_misinfo_exposure-1.1.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

py_misinfo_exposure-1.1-py3-none-any.whl (30.9 kB view details)

Uploaded Python 3

File details

Details for the file py_misinfo_exposure-1.1.tar.gz.

File metadata

  • Download URL: py_misinfo_exposure-1.1.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for py_misinfo_exposure-1.1.tar.gz
Algorithm Hash digest
SHA256 4464aeb66ef8b128ed30f633a867959c7ab1732ac681edd8472c3dd342124f81
MD5 a6afbde271b0e4b136abd337dbd3d159
BLAKE2b-256 d2d61ff9414401351e884a2ab44fdca59aec0285adc138debdda101a3ef349d2

See more details on using hashes here.

File details

Details for the file py_misinfo_exposure-1.1-py3-none-any.whl.

File metadata

  • Download URL: py_misinfo_exposure-1.1-py3-none-any.whl
  • Upload date:
  • Size: 30.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for py_misinfo_exposure-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c9e9f5176d686282b561ef09810d4037e5f63948a41cb12106935955af7cf690
MD5 e81a930251c8a086a3859d4f7c1dcae5
BLAKE2b-256 10bd37c5f9bebe0060077db19fc176f7afc188c15e8ab50758b28b890456d11b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page