Skip to main content

Python version of R careless package

Project description

Careless

A Python package for detecting careless responding in survey data using various statistical indices and methods.

Overview

When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations.

The careless package provides solutions designed to detect such careless/insufficient effort responses by allowing easy calculation of indices proposed in the literature. For a comprehensive review of these methods, see Curran (2016).

Features

  • Multiple Detection Methods: Supports various indices for detecting careless responding
  • Flexible Input: Works with lists, numpy arrays, and pandas DataFrames
  • Robust Implementation: Handles missing data, edge cases, and provides comprehensive error handling
  • Performance Optimized: Efficient algorithms for large datasets
  • Comprehensive Documentation: Detailed docstrings with examples for all functions

Installation

From PyPI (when available)

pip install careless-py

From Source

git clone https://github.com/Cameron-Lyons/careless-py.git
cd careless-py
pip install -e .

Optional Dependencies

For enhanced functionality (e.g., advanced Mahalanobis distance methods), install with full dependencies:

pip install careless-py[full]

Using uv (Recommended for Development)

This project uses uv for fast, reproducible dependency management. Install uv first:

curl -LsSf https://astral.sh/uv/install.sh | sh

Then clone and install:

git clone https://github.com/Cameron-Lyons/careless-py.git
cd careless-py
uv sync --extra full   # Install with all optional dependencies

For development with all dev tools:

uv sync --extra dev

Run commands in the virtual environment:

uv run pytest          # Run tests
uv run ruff check .    # Run linter
uv run mypy src/       # Run type checker

Quick Start

import numpy as np
from careless import evenodd, irv, longstring, mahad, psychsyn

# Sample survey data (rows = participants, columns = items)
data = np.array([
    [1, 2, 3, 4, 5, 6, 7, 8],  # Participant 1
    [2, 2, 2, 2, 5, 5, 5, 5],  # Participant 2 (suspicious pattern)
    [3, 4, 3, 4, 6, 7, 6, 7],  # Participant 3
])

# Check even-odd consistency
factors = [4, 4]  # Two factors with 4 items each
consistency_scores = evenodd(data, factors)
print("Even-odd consistency scores:", consistency_scores)

# Check intra-individual response variability
irv_scores = irv(data)
print("IRV scores:", irv_scores)

# Check for long strings of identical responses
longest_strings = longstring(data)
print("Longest strings:", longest_strings)

Available Functions

Consistency Indices

evenodd(x, factors, diag=False)

Computes the Even-Odd Consistency Index by dividing unidimensional scales using an even-odd split.

Parameters:

  • x: Input data (2D array/list) where rows are individuals and columns are responses
  • factors: List of integers specifying the length of each factor
  • diag: Boolean to return diagnostic values (number of valid correlations per individual)

Returns:

  • Array of even-odd consistency scores (average correlations per individual)
  • If diag=True: Tuple of (scores, diagnostic_values)

Example:

data = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7]]
factors = [4, 2]  # First factor has 4 items, second has 2
scores = evenodd(data, factors)

psychsyn(x, pairs, method='synonyms', seed=None)

Computes the Psychometric Synonyms/Antonyms Index based on correlated item pairs.

Parameters:

  • x: Input data (2D array/list)
  • pairs: List of item pairs [(item1, item2), ...]
  • method: 'synonyms' or 'antonyms'
  • seed: Random seed for reproducibility

Returns:

  • Array of psychometric synonym/antonym scores

Example:

data = [[1, 2, 3, 4], [2, 3, 4, 5]]
pairs = [(0, 1), (2, 3)]  # Item pairs to correlate
scores = psychsyn(data, pairs, method='synonyms')

psychant(x, pairs, seed=None)

Convenience wrapper for psychsyn that computes psychological antonyms.

Response Pattern Functions

longstring(x, avg=False)

Computes the longest (and optionally, average) length of consecutive identical responses.

Parameters:

  • x: Input data (2D array/list)
  • avg: Boolean to also return average string length

Returns:

  • Array of longest string lengths per individual
  • If avg=True: Tuple of (longest_strings, average_strings)

Example:

data = [[1, 1, 1, 2, 3], [1, 2, 3, 4, 5]]
longest, avg = longstring(data, avg=True)

irv(x, consecutive=None)

Computes the Intra-individual Response Variability (IRV), the standard deviation of responses across consecutive items.

Parameters:

  • x: Input data (2D array/list)
  • consecutive: Number of consecutive items to analyze (default: all items)

Returns:

  • Array of IRV scores per individual

Example:

data = [[1, 2, 3, 4, 5], [1, 1, 1, 1, 1]]
irv_scores = irv(data)

Statistical Outlier Functions

mahad(x, method='classic', threshold=None, **kwargs)

Computes Mahalanobis Distance to identify multivariate outliers.

Parameters:

  • x: Input data (2D array/list)
  • method: Detection method ('classic', 'robust', 'mcd', 'mve')
  • threshold: Custom threshold for outlier detection
  • **kwargs: Additional method-specific parameters

Returns:

  • Array of Mahalanobis distances per individual
  • If threshold provided: Tuple of (distances, outlier_flags)

Example:

data = [[1, 2, 3], [4, 5, 6], [1, 1, 1]]
distances, outliers = mahad(data, method='robust', threshold=0.95)

Advanced Usage

Working with Different Data Types

The package supports various input formats:

import numpy as np
import pandas as pd

# Numpy arrays
data_np = np.array([[1, 2, 3], [4, 5, 6]])

# Lists
data_list = [[1, 2, 3], [4, 5, 6]]

# Pandas DataFrames
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
data_df = df.values

# All work the same way
scores = evenodd(data_np, [3])

Handling Missing Data

The functions handle missing data (NaN values) appropriately:

import numpy as np

data_with_nans = np.array([
    [1, 2, np.nan, 4],
    [np.nan, 2, 3, 4],
    [1, 2, 3, 4]
])

# Functions will handle NaN values appropriately
scores = evenodd(data_with_nans, [4])

Custom Thresholds and Parameters

# Custom Mahalanobis distance threshold
distances, outliers = mahad(data, threshold=0.99)

# Custom IRV analysis on consecutive items
irv_scores = irv(data, consecutive=5)

# Psychometric synonyms with custom pairs
pairs = [(0, 1), (2, 3), (4, 5)]
syn_scores = psychsyn(data, pairs, method='synonyms')

Performance Considerations

  • Large Datasets: For datasets with >10,000 participants, consider processing in chunks
  • Memory Usage: Functions create copies of input data for processing
  • Parallel Processing: Consider using multiprocessing for very large datasets

Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this package in your research, please cite:

@software{careless2024,
  title={Careless: Python package for detecting careless responding},
  author={Lyons, Cameron},
  year={2024},
  url={https://github.com/Cameron-Lyons/careless}
}

References

  • Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19.
  • Dunn, A. M., Heggestad, E. D., Shanock, L. R., & Theilgard, N. (2018). Intra-individual response variability as an indicator of insufficient effort responding: Comparison to other indicators and relationships with individual differences. Journal of Business and Psychology, 33(1), 105-121.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

careless_py-1.0.0.tar.gz (18.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

careless_py-1.0.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file careless_py-1.0.0.tar.gz.

File metadata

  • Download URL: careless_py-1.0.0.tar.gz
  • Upload date:
  • Size: 18.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for careless_py-1.0.0.tar.gz
Algorithm Hash digest
SHA256 db28cb51a4b48c968c9d42aed96e7cbd99c520192ee8d3f0c91337d4398589dc
MD5 9b3d319a13f067f7e218b21a819f2173
BLAKE2b-256 a798fba029e03097c1bda3b509367ee3bfb1a0bd4af23bb013760ea2a304afcf

See more details on using hashes here.

Provenance

The following attestation bundles were made for careless_py-1.0.0.tar.gz:

Publisher: python-publish.yml on Cameron-Lyons/careless-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file careless_py-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: careless_py-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for careless_py-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fecad63086a74409e42aba6983fc491885b7694a719e2de5e5d7cf973b68e96e
MD5 09432543677689d356c5470847e9e555
BLAKE2b-256 2dd734b2726fac07bbd6d9204003abb2ee44cb6d3225ed29c31b9378858f2dad

See more details on using hashes here.

Provenance

The following attestation bundles were made for careless_py-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on Cameron-Lyons/careless-py

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page