A Persian Twitter policy agenda tracking framework
Project description
A Frameword For Tracking Legislator's Policy Agendas
This repository contains the implementation for the following paper:
Table of Contents
1) Tracking Legislators’ Expressed Policy Agendas in Real Time
TO-DO:
- Summarizing the paper
- Outlining the details of implementations
- Implement Word2Vec
- Training Word2Vec
- Seed words
- Classification heads
- Results & Analysis
- Tests&Coverage
- Documentation
- CI/CD
- Smooth Installation
A Brief Summary of The Papers
-
Introduction:
This work aims to analyse political orientation of legislators on salient policy issues through their temporally granular tweets, using a word embedding for feature extraction, and a classifier to label all legislators’ past and current relevant tweets according to whether they express a particular issue position over time. -
Main Problem:
Is it possible to accurately analyse the temporal evolution of political orientation on salient issues by applying natural language processing techniques on users tweets?The issues of concern in this paper are immigration, and climate change. -
Illustrative Example:
Given a tweet about immigration policy, they first encode it using word2vec enhanced dictionary, then its exclusiveness or inclusiveness can be detected using a classifier. Furthermore these results can be disaggregated to see whether it was posted from a Republican or a Democrat. -
I/O:
- Input: Tweets (textual modality)
- Output: Predicted stance on the salient political issue
-
Motivation:
-
Using tweets to track shifts in legislators’ rhetoric is highly scalable. It can be used on any topic of interest, by any political actor with a Twitter account, in any country around the world, from the past decade or into the future.
-
Twitter data has high temporal granularity.
-
-
Related (Previous) Works:
According to legislator’s different channels of communications, it is divided into 8 categories:
- Stump speeches: Fenno 1978
- Campaign mail: Golbeck, Grimes and Rogers 2010
- Television advertising: Lau, Sigelman and Rovner 2007
- Floor speeches: Martin and Vanberg 2008; Martin 2011; Quinn et al. 2010
- Press releases: Grimmer 2010; Grimmer, Westwood and Messing 2014; Klüver and Sagarzazu 2016
- Websites: Adler, Gent and Overmeyer 1998; Anstead and Chadwick 2008; Druckman, Kifer and Parkin 2009
- RSS feeds: Cormack 2013
- Social media posts: Gulati and Williams 2010; Barbera et al. 2018; Radford and Sinclair 2016; Shapiro et al. 2014; Lilleker and Koc-Michalska 2013
-
Contributions of this paper:
-
Simple, transparent, and interpretable approach to tweet classification can achieve satisfactory levels of accuracy across diverse issues.
-
Automate the process of updating and maintaining the model.
-
Develop a dynamical, real-time scalable method for tracking elected officials’ expressed policy positions through their tweets.
-
-
Proposed Method:
-
Stage I: (Feature Extraction)
They used Word2Vec enhanced dictionary to encode the texts. In particular, a set of stemmed seed words is identified as being relevant to the concept of interest. Then use word embeddings to identify other words that are semantically related to these seed words in the data. -
Stage II: Classification of political stance on salient issues.
Choice of classifier: using five-fold cross validation and comparing precision, recall, accuracy, balanced accuracy, and F1 scores to choose the best performing classifier among XGBoost, Naive Bayes, Elastic Net, Lasso.
-
-
Experiments:
-
Datasets:
Their own making. Crawled all senators and the vast majority of members of the House tweets using twitter API from any period of interest up to 2020, excluding those who left office or were elected for the first time.
-
Results:
Trained word embeddings on the entire corpus of legislators’ tweets. The word2vec dictionaries are limited to the 100 most similar words to the seed words and overly general or irrelevant terms are omitted. The detailed results provided in the appendix is summarised in the below table:
Dataset Issue Classification Method F1-score Recall Precision Accuracy Balanced Accuracy Crawled Legislators' Tweets Immigration (Exclusive or Not) Naive Bayes 0.885 0.853 0.921 0.813 0.738 XGBoost 0.871 0.909 0.836 0.795 0.668 Elastic Net 0.881 0.967 0.809 0.801 0.615 Lasso 0.871 0.962 0.797 0.784 0.586 Immigration (Inclusive or Not) Naive Bayes 0.892 0.865 0.920 0.830 0.781 XGBoost 0.888 0.916 0.861 0.828 0.746 Elastic Net 0.890 0.978 0.817 0.821 0.674 Lasso 0.894 0.974 0.826 0.828 0.691 Climent Change (No Action or Not) Naive Bayes 0.889 0.874 0.904 0.827 0.742 XGBoost 0.888 0.896 0.880 0.818 0.698 Elastic Net 0.891 0.963 0.830 0.811 0.575 Lasso 0.892 0.965 0.830 0.813 0.576 Climent Change (Take Action or Not) Naive Bayes 0.687 0.742 0.640 0.758 0.746 XGBoost 0.678 0.694 0.662 0.736 0.729 Elastic Net 0.706 0.764 0.655 0.745 0.748 Lasso 0.700 0.764 0.646 0.738 0.742 -
Implementation details:
Reproducing Results for XGB
Dataset | Issue | Classification Method | F1-score | Recall | Precision | Accuracy | Balanced Accuracy |
---|---|---|---|---|---|---|---|
Crawled Persian Tweets | JCPOA (Relevant or Not) | Naive Bayes | 0.845 | 0.901 | 0.792 | 0.843 | 0.839 |
XGBoost | 0.999 | 0.999 | 0.999 | 0.999 | 0.999 | ||
Passive Aggressive | 0.991 | 0.983 | 0.994 | 0.992 | 0.991 | ||
Lasso | 0.988 | 0.985 | 0.983 | 0.984 | 0.987 | ||
Stock Market (Relevant or Not) | Naive Bayes | 0.892 | 0.865 | 0.920 | 0.830 | 0.781 | |
XGBoost | 0.999 | 0.999 | 1.000 | 0.999 | 0.999 | ||
Elastic Net | 0.890 | 0.978 | 0.817 | 0.821 | 0.674 | ||
Lasso | 0.894 | 0.974 | 0.826 | 0.828 | 0.691 | ||
Vaccination (Relevant or Not) | Naive Bayes | 0.870 | 0.92 | 0.82 | 0.855 | 0.883 | |
XGBoost | 1.000 | 1.000 | 1.000 | 1.000 | 1.000 | ||
Passive Aggressive | 0.975 | 0.945 | 0.965 | 0.97 | 0.95 | ||
Lasso | 0.971 | 0.955 | 0.973 | 0.970 | 0.959 | ||
Filtering (Relevant or Not) | Naive Bayes | 0.687 | 0.742 | 0.640 | 0.758 | 0.746 | |
XGBoost | 0.950 | 0.951 | 0.958 | 0.954 | 0.950 | ||
Elastic Net | 0.706 | 0.764 | 0.655 | 0.745 | 0.748 | ||
Lasso | 0.700 | 0.764 | 0.646 | 0.738 | 0.742 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tracking_policy_agendas-1.0.0.tar.gz
.
File metadata
- Download URL: tracking_policy_agendas-1.0.0.tar.gz
- Upload date:
- Size: 16.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b1df183d94c998c6b9223e4f02fe846e65e3871b59ca35a9688a123899570788 |
|
MD5 | 19269ece39adea2ae5888599b1d0258c |
|
BLAKE2b-256 | 9969aaa8d69e76e7a10b25b1f51ceb7662c8670bfbd0de2987a7b7a8fd5df74e |
File details
Details for the file tracking_policy_agendas-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: tracking_policy_agendas-1.0.0-py3-none-any.whl
- Upload date:
- Size: 15.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ae3e58eb0e2aa3250dfa3517c0c47dfe1be1b4451cc604fc8080b036c43d3b6e |
|
MD5 | f81cdf58c88d51b69e84cc0d2d4fb5fa |
|
BLAKE2b-256 | baed289c9b48387f261fce5146f4e734d739577882dec6ac0f139286f6fbcb71 |