Skip to main content

Functions to load and analyzes open-ended data from the ANES election perception surveys conducted every 4 years

Project description

election_text_analysis

Note to Dr. Brambor and the teaching assistants: thank you for everything in this class! Based on Dr. Brambor's feedback, I reduced the scope of my project based on the proposal. I removed the lemmatization step as well as the n-grams steps. I also just focused on data from 2008-2020. I also focused more on the data steps we learned in this class rather than the analysis. I was able to use the timeseries file directly from ANES that had almost all questions normalized, but didn't contain the open-ended responses. For this project, I joined the open-ended responses from 2008, 2012, 2016, and 2020 with the timeseries data. I then built some functions to calculate the frequency of words across a set of responses.

Package on PyPI

https://pypi.org/project/election_text_analysis/

Docs

https://election-text-analysis.readthedocs.io/en/latest/

Overview

Functions to load and analyzes open-ended data from the ANES election perception surveys conducted every 4 years.

ANES (American National Election Studies) conducts a large-scale survey every four years, coinciding with US Presidential elections. The survey focuses on voter preferences and election-related behavior, as well as questions on public opinion and attitudes. These studies are conducted as pre-election and post-election interviews.

Some of the most interesting questions asked in the ANES survey are open-ended text responses. These include questions asking voters what they like (and dislike) about each party's candidate. For some years of data, these questions were asked for other positions as well (ie House and Senate candidates).

These open-ended responses have the potential for fascinating analysis on how voter preferences and reasoning (in their own words) have changed over time. This module aims to make it easier to analyze those open-ended responses.

Installation

$ pip install election_text_analysis

Usage

>>> from election_text_analysis import download_data, read_data, analyze

>>> # Downloads all data necessary to analyze open-ends from 2008-2020
>>> download_data.download_all()

# Loads a dataframe of data over time, with open-ends from 2008-2020
>>> df = read_data.read_all_data()

Dataset

The full codebook for all columns can be found at https://electionstudies.org/anes_timeseries_cdf_codebook_var_20220916/

The "Year" variable is stored in column 'VCF0004'. For example, here is a count of rows of data for every year since 1984 (the first year we have open-ended data)

>>> greater_than_1984 = df[df['VCF0004'] >= 1984]
>>> greater_than_1984['VCF0004'].value_counts().sort_index()
VCF0004
1984    2257
1986    2176
1988    2040
1990    1980
1992    2485
1994    1795
1996    1714
1998    1281
2000    1807
2002    1511
2004    1212
2008    2322
2012    5914
2016    4270
2020    8280
Name: count, dtype: int64

There are 8 open-ended columns:

open_ended_columns = ['Like About Democratic Candidate', 'Dislike About Democratic Candidate', 'Like About Republican Candidate', 'Dislike About Republican Candidate', 'Like About Democratic Party', 'Dislike About Democratic Party', 'Like About Republican Party', 'Dislike About Republican Party']

What people dislike about the Democratic Candidate vs the Republican Candidate in 2020:

dem_dislike_2020   = df[df['VCF0004'] == 2020]['Dislike About Democratic Candidate']
repub_dislike_2020 = df[df['VCF0004'] == 2020]['Dislike About Republican Candidate']
analyze.summarize_word_frequency_differences(dem_dislike_2020, repub_dislike_2020, group_1_label="Dem 2020 dislikes", group_2_label="Repub 2020 dislikes")

These words occurred more often in Dem 2020 dislikes:
        word  Dem 2020 dislikes freq  Repub 2020 dislikes freq
0        age                7.729841                  0.219912
1      biden                6.240370                  0.619752
2        old                5.906523                  0.479808
3      years                7.498716                  2.219112
4       left                5.392912                  0.239904
5   abortion                5.110426                  0.359856
6      party                5.264510                  0.799680
7     mental                4.571135                  0.419832
8      taxes                4.725218                  0.839664
9  socialist                3.775039                  0.019992


These words occurred more often in Repub 2020 dislikes:
         word  Dem 2020 dislikes freq  Repub 2020 dislikes freq
0      racist                1.129944                 11.735306
1      people                5.136107                 13.114754
2     country                7.524397                 13.954418
3        lies                0.873138                  7.057177
4        liar                1.206985                  7.277089
5        lack                1.438110                  7.297081
6  everything                2.670776                  8.256697
7       covid                0.667694                  6.117553
8    pandemic                0.308166                  5.317873
9        self                0.256805                  3.118752

What people dislike about the Democratic Candidate in 2020 vs 2016

dem_dislike_2016   = df[df['VCF0004'] == 2016]['Dislike About Democratic Candidate']
analyze.summarize_word_frequency_differences(dem_dislike_2020, dem_dislike_2016, group_1_label="2020 Dem dislikes", group_2_label="2016 Dem dislikes")

These words occurred more often in 2020 Dem dislikes:
        word  2020 Dem dislikes freq  2016 Dem dislikes freq
0        age                7.729841                0.269750
1      years                7.498716                1.078998
2      biden                6.240370                0.000000
3    country                7.524397                1.425819
4  president                7.832563                1.888247
5        old                5.906523                0.192678
6       left                5.392912                0.346821
7      party                5.264510                0.385356
8     mental                4.571135                0.000000
9      taxes                4.725218                0.385356


These words occurred more often in 2016 Dem dislikes:
            word  2020 Dem dislikes freq  2016 Dem dislikes freq
0           liar                1.206985               10.481696
1           lies                0.873138                6.589595
2         emails                0.051361                5.086705
3          trust                1.386749                6.319846
4          email                0.000000                4.393064
5      dishonest                0.487930                4.662813
6     dishonesty                0.128403                3.159923
7  untrustworthy                0.205444                2.967245
8        scandal                0.051361                2.543353
9        clinton                0.359527                2.658960

What people dislike about the Republican Candidate in 2020 vs 2016

repub_dislike_2016 = df[df['VCF0004'] == 2016]['Dislike About Republican Candidate']
analyze.summarize_word_frequency_differences(repub_dislike_2020, repub_dislike_2016, group_1_label="2020 Repub dislikes", group_2_label="2016 Repub dislikes")

These words occurred more often in 2020 Repub dislikes:
        word  2020 Repub dislikes freq  2016 Repub dislikes freq
0    country                 13.954418                  2.522460
1     people                 13.114754                  5.563234
2      covid                  6.117553                  0.000000
3  president                  9.496202                  3.420871
4      trump                  7.117153                  1.105736
5       lies                  7.057177                  1.243953
6   pandemic                  5.317873                  0.000000
7       liar                  7.277089                  2.073255
8   american                  4.258297                  0.621977
9     office                  4.258297                  0.691085


These words occurred more often in 2016 Repub dislikes:
           word  2020 Repub dislikes freq  2016 Repub dislikes freq
0    experience                  0.599760                  5.977885
1         mouth                  1.939224                  3.489979
2         views                  1.379448                  2.660677
3         think                  4.938025                  6.081548
4     political                  1.559376                  2.522460
5    temperment                  0.039984                  0.967519
6  inexperience                  0.059976                  0.932965
7         women                  3.618553                  4.457498
8         bigot                  0.899640                  1.727713
9          know                  1.459416                  2.246026

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

election_text_analysis was created by Nikhila Anand. It is licensed under the terms of the MIT license.

Credits

election_text_analysis was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

election_text_analysis-0.1.2.tar.gz (14.6 kB view details)

Uploaded Source

Built Distribution

election_text_analysis-0.1.2-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file election_text_analysis-0.1.2.tar.gz.

File metadata

  • Download URL: election_text_analysis-0.1.2.tar.gz
  • Upload date:
  • Size: 14.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.5 Darwin/22.4.0

File hashes

Hashes for election_text_analysis-0.1.2.tar.gz
Algorithm Hash digest
SHA256 ea43c0c6b1a29227d7e985cd8117a5e92053520fff923dc1744537f9a2ed1b52
MD5 54a549bc7a354c764753b1a51cf21491
BLAKE2b-256 eff8e67a78029e982fbb0a80651c5970737358e3c97ece6cf70237b59704379d

See more details on using hashes here.

File details

Details for the file election_text_analysis-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for election_text_analysis-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e2090028e7259bba7309b534843713f242027aac4672524adb91296216cea675
MD5 b6696f9a1f35bc55ab1da032edb3a4a1
BLAKE2b-256 6f371290d3c84b88ae0ff9a0e3b018e76ed3e9ae22d4e5d9808260f7c01683c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page