Functions to load and analyzes open-ended data from the ANES election perception surveys conducted every 4 years
Project description
election_text_analysis
Note to Dr. Brambor and the teaching assistants: thank you for everything in this class! Based on Dr. Brambor's feedback, I reduced the scope of my project based on the proposal. I removed the lemmatization step as well as the n-grams steps. I also just focused on data from 2008-2020. I also focused more on the data steps we learned in this class rather than the analysis. I was able to use the timeseries file directly from ANES that had almost all questions normalized, but didn't contain the open-ended responses. For this project, I joined the open-ended responses from 2008, 2012, 2016, and 2020 with the timeseries data. I then built some functions to calculate the frequency of words across a set of responses.
Package on PyPI
https://pypi.org/project/election_text_analysis/
Docs
https://election-text-analysis.readthedocs.io/en/latest/
Overview
Functions to load and analyzes open-ended data from the ANES election perception surveys conducted every 4 years.
ANES (American National Election Studies) conducts a large-scale survey every four years, coinciding with US Presidential elections. The survey focuses on voter preferences and election-related behavior, as well as questions on public opinion and attitudes. These studies are conducted as pre-election and post-election interviews.
Some of the most interesting questions asked in the ANES survey are open-ended text responses. These include questions asking voters what they like (and dislike) about each party's candidate. For some years of data, these questions were asked for other positions as well (ie House and Senate candidates).
These open-ended responses have the potential for fascinating analysis on how voter preferences and reasoning (in their own words) have changed over time. This module aims to make it easier to analyze those open-ended responses.
Installation
$ pip install election_text_analysis
Usage
>>> from election_text_analysis import download_data, read_data, analyze
>>> # Downloads all data necessary to analyze open-ends from 2008-2020
>>> download_data.download_all()
# Loads a dataframe of data over time, with open-ends from 2008-2020
>>> df = read_data.read_all_data()
Dataset
The full codebook for all columns can be found at https://electionstudies.org/anes_timeseries_cdf_codebook_var_20220916/
The "Year" variable is stored in column 'VCF0004'. For example, here is a count of rows of data for every year since 1984 (the first year we have open-ended data)
>>> greater_than_1984 = df[df['VCF0004'] >= 1984]
>>> greater_than_1984['VCF0004'].value_counts().sort_index()
VCF0004
1984 2257
1986 2176
1988 2040
1990 1980
1992 2485
1994 1795
1996 1714
1998 1281
2000 1807
2002 1511
2004 1212
2008 2322
2012 5914
2016 4270
2020 8280
Name: count, dtype: int64
There are 8 open-ended columns:
open_ended_columns = ['Like About Democratic Candidate', 'Dislike About Democratic Candidate', 'Like About Republican Candidate', 'Dislike About Republican Candidate', 'Like About Democratic Party', 'Dislike About Democratic Party', 'Like About Republican Party', 'Dislike About Republican Party']
What people dislike about the Democratic Candidate vs the Republican Candidate in 2020:
dem_dislike_2020 = df[df['VCF0004'] == 2020]['Dislike About Democratic Candidate']
repub_dislike_2020 = df[df['VCF0004'] == 2020]['Dislike About Republican Candidate']
analyze.summarize_word_frequency_differences(dem_dislike_2020, repub_dislike_2020, group_1_label="Dem 2020 dislikes", group_2_label="Repub 2020 dislikes")
These words occurred more often in Dem 2020 dislikes:
word Dem 2020 dislikes freq Repub 2020 dislikes freq
0 age 7.729841 0.219912
1 biden 6.240370 0.619752
2 old 5.906523 0.479808
3 years 7.498716 2.219112
4 left 5.392912 0.239904
5 abortion 5.110426 0.359856
6 party 5.264510 0.799680
7 mental 4.571135 0.419832
8 taxes 4.725218 0.839664
9 socialist 3.775039 0.019992
These words occurred more often in Repub 2020 dislikes:
word Dem 2020 dislikes freq Repub 2020 dislikes freq
0 racist 1.129944 11.735306
1 people 5.136107 13.114754
2 country 7.524397 13.954418
3 lies 0.873138 7.057177
4 liar 1.206985 7.277089
5 lack 1.438110 7.297081
6 everything 2.670776 8.256697
7 covid 0.667694 6.117553
8 pandemic 0.308166 5.317873
9 self 0.256805 3.118752
What people dislike about the Democratic Candidate in 2020 vs 2016
dem_dislike_2016 = df[df['VCF0004'] == 2016]['Dislike About Democratic Candidate']
analyze.summarize_word_frequency_differences(dem_dislike_2020, dem_dislike_2016, group_1_label="2020 Dem dislikes", group_2_label="2016 Dem dislikes")
These words occurred more often in 2020 Dem dislikes:
word 2020 Dem dislikes freq 2016 Dem dislikes freq
0 age 7.729841 0.269750
1 years 7.498716 1.078998
2 biden 6.240370 0.000000
3 country 7.524397 1.425819
4 president 7.832563 1.888247
5 old 5.906523 0.192678
6 left 5.392912 0.346821
7 party 5.264510 0.385356
8 mental 4.571135 0.000000
9 taxes 4.725218 0.385356
These words occurred more often in 2016 Dem dislikes:
word 2020 Dem dislikes freq 2016 Dem dislikes freq
0 liar 1.206985 10.481696
1 lies 0.873138 6.589595
2 emails 0.051361 5.086705
3 trust 1.386749 6.319846
4 email 0.000000 4.393064
5 dishonest 0.487930 4.662813
6 dishonesty 0.128403 3.159923
7 untrustworthy 0.205444 2.967245
8 scandal 0.051361 2.543353
9 clinton 0.359527 2.658960
What people dislike about the Republican Candidate in 2020 vs 2016
repub_dislike_2016 = df[df['VCF0004'] == 2016]['Dislike About Republican Candidate']
analyze.summarize_word_frequency_differences(repub_dislike_2020, repub_dislike_2016, group_1_label="2020 Repub dislikes", group_2_label="2016 Repub dislikes")
These words occurred more often in 2020 Repub dislikes:
word 2020 Repub dislikes freq 2016 Repub dislikes freq
0 country 13.954418 2.522460
1 people 13.114754 5.563234
2 covid 6.117553 0.000000
3 president 9.496202 3.420871
4 trump 7.117153 1.105736
5 lies 7.057177 1.243953
6 pandemic 5.317873 0.000000
7 liar 7.277089 2.073255
8 american 4.258297 0.621977
9 office 4.258297 0.691085
These words occurred more often in 2016 Repub dislikes:
word 2020 Repub dislikes freq 2016 Repub dislikes freq
0 experience 0.599760 5.977885
1 mouth 1.939224 3.489979
2 views 1.379448 2.660677
3 think 4.938025 6.081548
4 political 1.559376 2.522460
5 temperment 0.039984 0.967519
6 inexperience 0.059976 0.932965
7 women 3.618553 4.457498
8 bigot 0.899640 1.727713
9 know 1.459416 2.246026
Contributing
Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.
License
election_text_analysis
was created by Nikhila Anand. It is licensed under the terms of the MIT license.
Credits
election_text_analysis
was created with cookiecutter
and the py-pkgs-cookiecutter
template.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file election_text_analysis-0.1.2.tar.gz
.
File metadata
- Download URL: election_text_analysis-0.1.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea43c0c6b1a29227d7e985cd8117a5e92053520fff923dc1744537f9a2ed1b52 |
|
MD5 | 54a549bc7a354c764753b1a51cf21491 |
|
BLAKE2b-256 | eff8e67a78029e982fbb0a80651c5970737358e3c97ece6cf70237b59704379d |
File details
Details for the file election_text_analysis-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: election_text_analysis-0.1.2-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/22.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e2090028e7259bba7309b534843713f242027aac4672524adb91296216cea675 |
|
MD5 | b6696f9a1f35bc55ab1da032edb3a4a1 |
|
BLAKE2b-256 | 6f371290d3c84b88ae0ff9a0e3b018e76ed3e9ae22d4e5d9808260f7c01683c3 |