Python version of R careless package
Project description
Careless
A Python package for detecting careless responding in survey data using various statistical indices and methods.
Overview
When taking online surveys, participants sometimes respond to items without regard to their content. These types of responses, referred to as careless or insufficient effort responding, constitute significant problems for data quality, leading to distortions in data analysis and hypothesis testing, such as spurious correlations.
The careless package provides solutions designed to detect such careless/insufficient effort responses by allowing easy calculation of indices proposed in the literature. For a comprehensive review of these methods, see Curran (2016).
Features
- Multiple Detection Methods: Supports various indices for detecting careless responding
- Flexible Input: Works with lists, numpy arrays, and pandas DataFrames
- Robust Implementation: Handles missing data, edge cases, and provides comprehensive error handling
- Performance Optimized: Efficient algorithms for large datasets
- Comprehensive Documentation: Detailed docstrings with examples for all functions
Installation
From PyPI (when available)
pip install careless-py
From Source
git clone https://github.com/Cameron-Lyons/careless-py.git
cd careless-py
pip install -e .
Optional Dependencies
For enhanced functionality (e.g., advanced Mahalanobis distance methods), install with full dependencies:
pip install careless-py[full]
Using uv (Recommended for Development)
This project uses uv for fast, reproducible dependency management. Install uv first:
curl -LsSf https://astral.sh/uv/install.sh | sh
Then clone and install:
git clone https://github.com/Cameron-Lyons/careless-py.git
cd careless-py
uv sync --extra full # Install with all optional dependencies
For development with all dev tools:
uv sync --extra dev
Run commands in the virtual environment:
uv run pytest # Run tests
uv run ruff check . # Run linter
uv run mypy src/ # Run type checker
Quick Start
import numpy as np
from careless import evenodd, irv, longstring, mahad, psychsyn
# Sample survey data (rows = participants, columns = items)
data = np.array([
[1, 2, 3, 4, 5, 6, 7, 8], # Participant 1
[2, 2, 2, 2, 5, 5, 5, 5], # Participant 2 (suspicious pattern)
[3, 4, 3, 4, 6, 7, 6, 7], # Participant 3
])
# Check even-odd consistency
factors = [4, 4] # Two factors with 4 items each
consistency_scores = evenodd(data, factors)
print("Even-odd consistency scores:", consistency_scores)
# Check intra-individual response variability
irv_scores = irv(data)
print("IRV scores:", irv_scores)
# Check for long strings of identical responses
longest_strings = longstring(data)
print("Longest strings:", longest_strings)
Available Functions
Consistency Indices
evenodd(x, factors, diag=False)
Computes the Even-Odd Consistency Index by dividing unidimensional scales using an even-odd split.
Parameters:
x: Input data (2D array/list) where rows are individuals and columns are responsesfactors: List of integers specifying the length of each factordiag: Boolean to return diagnostic values (number of valid correlations per individual)
Returns:
- Array of even-odd consistency scores (average correlations per individual)
- If
diag=True: Tuple of (scores, diagnostic_values)
Example:
data = [[1, 2, 3, 4, 5, 6], [2, 3, 4, 5, 6, 7]]
factors = [4, 2] # First factor has 4 items, second has 2
scores = evenodd(data, factors)
psychsyn(x, pairs, method='synonyms', seed=None)
Computes the Psychometric Synonyms/Antonyms Index based on correlated item pairs.
Parameters:
x: Input data (2D array/list)pairs: List of item pairs [(item1, item2), ...]method: 'synonyms' or 'antonyms'seed: Random seed for reproducibility
Returns:
- Array of psychometric synonym/antonym scores
Example:
data = [[1, 2, 3, 4], [2, 3, 4, 5]]
pairs = [(0, 1), (2, 3)] # Item pairs to correlate
scores = psychsyn(data, pairs, method='synonyms')
psychant(x, pairs, seed=None)
Convenience wrapper for psychsyn that computes psychological antonyms.
Response Pattern Functions
longstring(x, avg=False)
Computes the longest (and optionally, average) length of consecutive identical responses.
Parameters:
x: Input data (2D array/list)avg: Boolean to also return average string length
Returns:
- Array of longest string lengths per individual
- If
avg=True: Tuple of (longest_strings, average_strings)
Example:
data = [[1, 1, 1, 2, 3], [1, 2, 3, 4, 5]]
longest, avg = longstring(data, avg=True)
irv(x, consecutive=None)
Computes the Intra-individual Response Variability (IRV), the standard deviation of responses across consecutive items.
Parameters:
x: Input data (2D array/list)consecutive: Number of consecutive items to analyze (default: all items)
Returns:
- Array of IRV scores per individual
Example:
data = [[1, 2, 3, 4, 5], [1, 1, 1, 1, 1]]
irv_scores = irv(data)
Statistical Outlier Functions
mahad(x, method='classic', threshold=None, **kwargs)
Computes Mahalanobis Distance to identify multivariate outliers.
Parameters:
x: Input data (2D array/list)method: Detection method ('classic', 'robust', 'mcd', 'mve')threshold: Custom threshold for outlier detection**kwargs: Additional method-specific parameters
Returns:
- Array of Mahalanobis distances per individual
- If
thresholdprovided: Tuple of (distances, outlier_flags)
Example:
data = [[1, 2, 3], [4, 5, 6], [1, 1, 1]]
distances, outliers = mahad(data, method='robust', threshold=0.95)
Advanced Usage
Working with Different Data Types
The package supports various input formats:
import numpy as np
import pandas as pd
# Numpy arrays
data_np = np.array([[1, 2, 3], [4, 5, 6]])
# Lists
data_list = [[1, 2, 3], [4, 5, 6]]
# Pandas DataFrames
df = pd.DataFrame([[1, 2, 3], [4, 5, 6]])
data_df = df.values
# All work the same way
scores = evenodd(data_np, [3])
Handling Missing Data
The functions handle missing data (NaN values) appropriately:
import numpy as np
data_with_nans = np.array([
[1, 2, np.nan, 4],
[np.nan, 2, 3, 4],
[1, 2, 3, 4]
])
# Functions will handle NaN values appropriately
scores = evenodd(data_with_nans, [4])
Custom Thresholds and Parameters
# Custom Mahalanobis distance threshold
distances, outliers = mahad(data, threshold=0.99)
# Custom IRV analysis on consecutive items
irv_scores = irv(data, consecutive=5)
# Psychometric synonyms with custom pairs
pairs = [(0, 1), (2, 3), (4, 5)]
syn_scores = psychsyn(data, pairs, method='synonyms')
Performance Considerations
- Large Datasets: For datasets with >10,000 participants, consider processing in chunks
- Memory Usage: Functions create copies of input data for processing
- Parallel Processing: Consider using multiprocessing for very large datasets
Contributing
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Citation
If you use this package in your research, please cite:
@software{careless2024,
title={Careless: Python package for detecting careless responding},
author={Lyons, Cameron},
year={2024},
url={https://github.com/Cameron-Lyons/careless}
}
References
- Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4-19.
- Dunn, A. M., Heggestad, E. D., Shanock, L. R., & Theilgard, N. (2018). Intra-individual response variability as an indicator of insufficient effort responding: Comparison to other indicators and relationships with individual differences. Journal of Business and Psychology, 33(1), 105-121.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file careless_py-1.0.4.tar.gz.
File metadata
- Download URL: careless_py-1.0.4.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c57a862a984a7a227ad0547a2c0910c15a16ba1813b088e7c72209ff69e43ee
|
|
| MD5 |
9904f2725904a6b206bea7935ba9b9ed
|
|
| BLAKE2b-256 |
1a1e828c73b04f74b367490fd3a51e3ecfe3686df63bf44c943d4757cdcfa781
|
Provenance
The following attestation bundles were made for careless_py-1.0.4.tar.gz:
Publisher:
python-publish.yml on Cameron-Lyons/careless-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
careless_py-1.0.4.tar.gz -
Subject digest:
3c57a862a984a7a227ad0547a2c0910c15a16ba1813b088e7c72209ff69e43ee - Sigstore transparency entry: 786192093
- Sigstore integration time:
-
Permalink:
Cameron-Lyons/careless-py@b754479e9330537dc989916cde39648b689915c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Cameron-Lyons
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b754479e9330537dc989916cde39648b689915c3 -
Trigger Event:
push
-
Statement type:
File details
Details for the file careless_py-1.0.4-py3-none-any.whl.
File metadata
- Download URL: careless_py-1.0.4-py3-none-any.whl
- Upload date:
- Size: 18.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cb94d278775ab98ac9e75772b4c64788de8cc408c4a0835c61502d3dba4c40a
|
|
| MD5 |
e5ba77909b0311b7b7ec421bb232cc55
|
|
| BLAKE2b-256 |
2cb005334451c6e0d19688577d8de8c9732829066c577c9daa1d00d711c89032
|
Provenance
The following attestation bundles were made for careless_py-1.0.4-py3-none-any.whl:
Publisher:
python-publish.yml on Cameron-Lyons/careless-py
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
careless_py-1.0.4-py3-none-any.whl -
Subject digest:
7cb94d278775ab98ac9e75772b4c64788de8cc408c4a0835c61502d3dba4c40a - Sigstore transparency entry: 786192112
- Sigstore integration time:
-
Permalink:
Cameron-Lyons/careless-py@b754479e9330537dc989916cde39648b689915c3 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Cameron-Lyons
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@b754479e9330537dc989916cde39648b689915c3 -
Trigger Event:
push
-
Statement type: