Automatically detect bad responses in survey responses
Project description
Survey Dud Detector
Apply methods to detect bad responses in surveys.
Detect Straightlining
Straightlining involves someone answering the same item on a scale for all the questions (e.g., saying "Strongly Agree" to everything).
from survey_dud_detector import detect_straightlining
likert_cols = [c for c in mfa.columns if 'agree' in c or 'would' in c or 'favorable' in c]
straightlining = detect_straightlining(df[likert_cols])
# Examine incidence of straightlining (results are normalized to % of questions examined)
print(straightlining.value_counts())
# Drop everyone who perfectly straightlined
df = df[straightlining < 1]
Multiple Low Incidence Detection
Multiple low incidence involves someone answering multiple questions with an unlikely answer (e.g., saying they are a Native American or that they are non-binary). Obviously unlikely answers themselves are not an issue, but multiple low incidence can indicate someone might be trolling (i.e., pretenting to be a non-binary Native American who is Very Conservative and earns over $150K).
demographics = ['gender', 'race', 'education', 'urban_rural', 'politics', 'income', 'age', 'vote2016']
# Detect low incidence - the threshold defines what rarity you want to count as "low incidence" (0.04 means anything with 4% or less occurance will be defined as "low incidence")
low_incidence_counts = detect_low_incidence(df[demographics], low_incidence_threshold=0.04)
# Examine incidence of straightlining (results are number of low incidence answers)
print(low_incidence_counts.value_counts())
# It might be good to look at the values of people with a high number of low incidence answers
# just in case this is actually legitimate.
print(df[low_incidence_counts >= 3])
# Drop everyone who gave three or more low incidence answers
df = df[low_incidence_counts < 3]
Installation
pip3 install survey_dud_detector
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file survey_dud_detector-0.2.tar.gz
.
File metadata
- Download URL: survey_dud_detector-0.2.tar.gz
- Upload date:
- Size: 2.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b3a02e572578548259da8c0330344979b3dd8a2d7c901f78ec88eb23f632c2f5 |
|
MD5 | baadd23eab93cf75a9f4d66723d1b732 |
|
BLAKE2b-256 | 9d3359aa7f4c12a04bc42ce1294c99f31d7722fac7b63771700d0dbcd1e78cbf |
File details
Details for the file survey_dud_detector-0.2-py3-none-any.whl
.
File metadata
- Download URL: survey_dud_detector-0.2-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 305604c798a701f21698470028ca2f2339779641290c2cc9e2f1d052e1b9e307 |
|
MD5 | b3399a4c683a4ea72f79316aa856b0e1 |
|
BLAKE2b-256 | 28b4408a83a941e2f66536bc21d02c71aa40659448b6479be314f5c99d5ef4f0 |