Skip to main content

Python library for detecting Insufficient Effort Responding (IER) in survey data

Project description

IER

A Python package for detecting Insufficient Effort Responding (IER) in survey data using various statistical indices and methods.

Overview

When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as insufficient effort responding (IER) or careless responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing.

The ier package provides solutions designed to detect such insufficient effort responses by allowing easy calculation of indices proposed in the literature. For a comprehensive review of these methods, see Curran (2016).

Features

  • Multiple Detection Methods: Supports 15+ indices for detecting careless responding
  • Flexible Input: Works with lists, numpy arrays, pandas DataFrames, and polars DataFrames
  • Robust Implementation: Handles missing data and edge cases
  • Type Hints: Full type annotations for IDE support

Installation

From PyPI

pip install insufficient-effort

From Source

git clone https://github.com/Cameron-Lyons/ier.git
cd ier
pip install -e .

Optional Dependencies

For enhanced functionality (e.g., chi-squared outlier detection):

pip install insufficient-effort[full]

Quick Start

import numpy as np
from ier import irv, mahad, longstring, evenodd, psychsyn

# Sample survey data (rows = participants, columns = items)
data = np.array([
    [1, 2, 3, 4, 5, 6, 7, 8],  # Normal responding
    [3, 3, 3, 3, 3, 3, 3, 3],  # Straightlining
    [1, 5, 1, 5, 1, 5, 1, 5],  # Alternating pattern
])

# Intra-individual response variability (low = straightlining)
print("IRV:", irv(data))

# Mahalanobis distance (high = outlier)
print("Mahad:", mahad(data))

# Longest string of identical responses
print("Longstring:", longstring(data))

Available Functions

Consistency Indices

evenodd(x, factors, diag=False)

Computes even-odd consistency by correlating responses to even vs odd items within each factor.

from ier import evenodd

data = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7]]
factors = [3, 3]  # Two factors with 3 items each
scores = evenodd(data, factors)

psychsyn(x, critval=0.60, anto=False, diag=False)

Identifies highly correlated item pairs and computes within-person correlations.

from ier import psychsyn, psychant

data = [[1, 2, 3, 4], [2, 3, 4, 5], [3, 4, 5, 6]]
scores = psychsyn(data, critval=0.5)  # Synonyms
scores = psychant(data, critval=-0.5)  # Antonyms

individual_reliability(x, n_splits=100, random_seed=None)

Estimates response consistency using repeated split-half correlations.

from ier import individual_reliability, individual_reliability_flag

data = [[1, 2, 1, 2, 1, 2], [1, 5, 2, 4, 3, 3]]
reliability = individual_reliability(data, n_splits=50)
flags = individual_reliability_flag(data, threshold=0.3)

person_total(x, na_rm=True)

Correlates each person's responses with the sample mean response pattern.

from ier import person_total

data = [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1], [1, 2, 3, 4, 5]]
scores = person_total(data)  # [1.0, -1.0, 1.0]

semantic_syn(x, item_pairs, anto=False) / semantic_ant(x, item_pairs)

Computes consistency for predefined semantic synonym/antonym pairs.

from ier import semantic_syn, semantic_ant

data = [[1, 1, 5, 5], [1, 2, 5, 4]]
pairs = [(0, 1), (2, 3)]  # Predefined synonym pairs
scores = semantic_syn(data, pairs)

guttman(x, na_rm=True, normalize=True)

Counts response reversals relative to item difficulty ordering.

from ier import guttman, guttman_flag

data = [[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]]
errors = guttman(data)
flags = guttman_flag(data, threshold=0.5)

Response Pattern Indices

longstring(x, avg=False)

Computes the longest (or average) run of identical consecutive responses.

from ier import longstring

# Single string
longstring("AAABBBCCDAA")  # ('A', 3)

# Matrix of responses
data = [[1, 1, 1, 2, 3], [1, 2, 3, 4, 5]]
longstring(data)  # [('1', 3), ('1', 1)]
longstring(data, avg=True)  # [1.67, 1.0]

irv(x, na_rm=True, split=False, num_split=1)

Computes intra-individual response variability (standard deviation).

from ier import irv

data = [[1, 2, 3, 4, 5], [3, 3, 3, 3, 3]]
scores = irv(data)  # High for varied, low for straightlining

# Split-half IRV
scores = irv(data, split=True, num_split=2)

u3_poly(x, scale_min=None, scale_max=None)

Proportion of extreme responses (at scale endpoints).

from ier import u3_poly

data = [[1, 5, 1, 5, 3], [3, 3, 3, 3, 3]]
extreme = u3_poly(data, scale_min=1, scale_max=5)

midpoint_responding(x, scale_min=None, scale_max=None, tolerance=0.0)

Proportion of midpoint responses.

from ier import midpoint_responding

data = [[1, 2, 3, 4, 5], [3, 3, 3, 3, 3]]
mid = midpoint_responding(data, scale_min=1, scale_max=5)  # [0.2, 1.0]

response_pattern(x, scale_min=None, scale_max=None)

Returns multiple response style indices at once.

from ier import response_pattern

patterns = response_pattern(data, scale_min=1, scale_max=5)
# Returns dict with: extreme, midpoint, acquiescence, variability

Statistical Outlier Detection

mahad(x, flag=False, confidence=0.95, na_rm=False, method='chi2')

Computes Mahalanobis distance for multivariate outlier detection.

from ier import mahad, mahad_summary

data = [[1, 2, 3], [2, 3, 4], [3, 4, 5], [10, 10, 10]]
distances = mahad(data)
distances, flags = mahad(data, flag=True, confidence=0.95)

# Methods: 'chi2', 'iqr', 'zscore'
distances, flags = mahad(data, flag=True, method='iqr')

Response Time Indices

response_time(times, metric='median')

Computes response time statistics per person.

from ier import response_time, response_time_flag, response_time_consistency

times = [[2.1, 3.4, 2.8], [0.5, 0.4, 0.6], [2.5, 2.3, 2.7]]

avg_times = response_time(times, metric='mean')
med_times = response_time(times, metric='median')
min_times = response_time(times, metric='min')

# Flag fast responders
flags = response_time_flag(times, threshold=1.0)

# Coefficient of variation (low = suspiciously uniform)
cv = response_time_consistency(times)

Working with DataFrames

The package works with pandas and polars DataFrames:

import pandas as pd
import polars as pl
from ier import irv

# Pandas
df_pandas = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
scores = irv(df_pandas)

# Polars
df_polars = pl.DataFrame([[1, 2, 3], [4, 5, 6]])
scores = irv(df_polars)

Handling Missing Data

Most functions handle NaN values appropriately:

import numpy as np
from ier import irv, mahad

data = np.array([
    [1, 2, np.nan, 4],
    [np.nan, 2, 3, 4],
    [1, 2, 3, 4]
])

irv_scores = irv(data, na_rm=True)
mahad_scores = mahad(data, na_rm=True)

Contributing

Contributions are welcome! Please open an issue first to discuss changes.

License

MIT License - see LICENSE for details.

Citation

@software{ier2024,
  title={IER: Python package for detecting Insufficient Effort Responding},
  author={Lyons, Cameron},
  year={2024},
  url={https://github.com/Cameron-Lyons/ier}
}

References

  • Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19.
  • Dunn, A. M., Heggestad, E. D., Shanock, L. R., & Theilgard, N. (2018). Intra-individual response variability as an indicator of insufficient effort responding. Journal of Business and Psychology, 33(1), 105-121.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insufficient_effort-1.1.2.tar.gz (27.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insufficient_effort-1.1.2-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file insufficient_effort-1.1.2.tar.gz.

File metadata

  • Download URL: insufficient_effort-1.1.2.tar.gz
  • Upload date:
  • Size: 27.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for insufficient_effort-1.1.2.tar.gz
Algorithm Hash digest
SHA256 a42cabe76c710cea131f72cea0cd4520af3cde6575f754db7b2d69a8405511cc
MD5 f9203eba33c92734dcbffcc7dad7f22a
BLAKE2b-256 0aea5e1a4cf955ead5f57850846cd7f200a2e0b4e004fb8d454887acd71b492a

See more details on using hashes here.

Provenance

The following attestation bundles were made for insufficient_effort-1.1.2.tar.gz:

Publisher: python-publish.yml on Cameron-Lyons/ier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file insufficient_effort-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for insufficient_effort-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8148e464d95fc104edad61e5af8c7d28904bc89e69b311035a472dd17b743346
MD5 c44f7a12860867a130a91d88c45a41aa
BLAKE2b-256 19c09bf40ad76d4ef33db42f375ed339e65abb5e1a80ea74c07e3a768afdf5c9

See more details on using hashes here.

Provenance

The following attestation bundles were made for insufficient_effort-1.1.2-py3-none-any.whl:

Publisher: python-publish.yml on Cameron-Lyons/ier

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page