Behavioral Signal Analysis for User Understanding - Detect bots, shared accounts, and UI confusion
Project description
PyRevealed
A Python implementation of revealed preference theory.
Based on: Chambers, C. P., & Echenique, F. (2016). Revealed Preference Theory. Cambridge University Press.
What is this?
Given a history of user choices and the options available at each choice, PyRevealed computes:
- Consistency scores: How internally consistent is this user's behavior? (0 = random, 1 = perfectly consistent)
- Preference recovery: If consistent, what utility function explains their choices?
- Exploitability metrics: How much could be extracted from a user via arbitrage on their inconsistencies?
- Feature independence: Are choices over group A independent of choices over group B?
Installation
pip install pyrevealed
For visualization support:
pip install pyrevealed[viz]
Quick Start
from pyrevealed import BehaviorLog, validate_consistency, compute_integrity_score, compute_confusion_metric
import numpy as np
# Create a behavior log from observed choices
log = BehaviorLog(
cost_vectors=np.array([ # Prices at each observation (T x N)
[1.0, 2.0], # Observation 0: price of good A=1, B=2
[2.0, 1.0], # Observation 1: price of good A=2, B=1
]),
action_vectors=np.array([ # Quantities chosen (T x N)
[3.0, 1.0], # Observation 0: bought 3 of A, 1 of B
[1.0, 3.0], # Observation 1: bought 1 of A, 3 of B
])
)
# Test consistency (GARP)
is_consistent = validate_consistency(log)
print(f"Consistent: {is_consistent}")
# Compute integrity score (Afriat Efficiency Index)
integrity = compute_integrity_score(log)
print(f"Integrity Score: {integrity:.3f}")
# Compute confusion metric (Money Pump Index)
confusion = compute_confusion_metric(log)
print(f"Confusion Metric: {confusion:.3f}")
Available Tests & Scores
Yes/No Tests
| Method | Question it answers |
|---|---|
validate_consistency(log) |
Is this user rational? (no self-contradicting choices) |
validate_consistency_weak(log) |
Any obvious flip-flops? (picked A over B, then B over A) |
validate_smooth_preferences(log) |
Smooth preferences? (needed for price sensitivity analysis) |
validate_strict_consistency(log) |
Approximately rational? (ignores minor contradictions) |
validate_price_preferences(log) |
Does user prefer situations where their items are cheaper? |
Scores (0 to 1)
| Method | What it measures |
|---|---|
compute_integrity_score(log) |
How consistent is this user? (higher = more consistent) |
compute_confusion_metric(log) |
How exploitable via pricing tricks? (lower = safer) |
compute_minimal_outlier_fraction(log) |
Fraction of observations to remove for consistency |
compute_test_power(log) |
Statistical power of consistency test |
Preference Structure
| Method | Question it answers |
|---|---|
validate_proportional_scaling(log) |
Do they buy the same mix regardless of budget size? |
test_income_invariance(log) |
Does budget size affect what they choose? |
test_feature_independence(log, [a], [b]) |
Are choices in group A separate from group B? |
test_cross_price_effect(log, item1, item2) |
Are these items substitutes or complements? |
transform_to_characteristics(log, A) |
Analyze by attributes (nutrition, specs) not products |
Case Study
See DUNNHUMBY.md for a real-world validation on 2,222 households from the Dunnhumby grocery dataset.
Key findings: 4.5% fully consistent, mean integrity 0.839, test power 0.845.
Project Structure
pyrevealed/
├── src/pyrevealed/
│ ├── auditor.py # BehavioralAuditor class
│ ├── encoder.py # PreferenceEncoder class
│ ├── lancaster.py # Lancaster characteristics model
│ ├── algorithms/ # Core algorithms
│ ├── core/ # Data containers
│ ├── graph/ # Graph algorithms
│ └── viz/ # Visualization
├── tests/ # Unit tests
├── dunnhumby/ # Real-world validation suite
│ ├── run_all.py # Main test runner
│ ├── extended_analysis.py # Statistical analyses
│ ├── comprehensive_analysis.py # MPI, WARP, separability
│ ├── advanced_analysis.py # Complementarity, stress tests
│ ├── encoder_analysis.py # Auto-discovery, Houtman-Maks
│ ├── predictive_analysis.py # Split-sample LightGBM
│ ├── lancaster_analysis.py # Lancaster characteristics model
│ └── data/ # Kaggle dataset (download required)
├── docs/images/ # README visualizations
├── notebooks/ # Tutorials
└── examples/ # Advanced usage examples
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pyrevealed-0.4.0.tar.gz.
File metadata
- Download URL: pyrevealed-0.4.0.tar.gz
- Upload date:
- Size: 4.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bcbeb1bca42c335342839560bd1f92bc68fa52ea2bbb85aecf8898fe4f7f0536
|
|
| MD5 |
b9f554354af5013c8366d3ca5d29a4aa
|
|
| BLAKE2b-256 |
4d5e8f778043fd53b27c9e9ddc15692801c3e96d5c6d83e5c2df028f789c6135
|
File details
Details for the file pyrevealed-0.4.0-py3-none-any.whl.
File metadata
- Download URL: pyrevealed-0.4.0-py3-none-any.whl
- Upload date:
- Size: 98.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea110a388b516b4149e4ea68404387e96c75df224180f05c06c0be3bcaf01493
|
|
| MD5 |
ebd550571887a7a7c1ebc07675581345
|
|
| BLAKE2b-256 |
d50262af4e88edf35eb7914d03b6ecaf0a7e8f431c8514cb6d62e8ccbe88092b
|