Analysis toolkit for EYE-LEAN (Locomotion, Exploration, Action, and Navigation with Eye Tracking) — Python half of the Eye_lean VR behavioral-research toolkit
Project description
eyelean_analysis — Python Researcher Guide
The Python half of Eye_lean — EYE-LEAN (Locomotion, Exploration, Action, and Navigation with Eye Tracking): a Behavioral Research Toolkit for Data Rich Virtual Reality Experiments. Loads the CSVs written by the Unity toolkit and provides signal processing, eye-movement classification, attention and cognitive-load metrics, per-phase reports for the sample experiment, post-hoc calibration correction, and batch processing across many recordings.
This document is the canonical "I have a recording, what do I do with it?" walkthrough. It mirrors the Unity-side RESEARCHER_GUIDE.md, which covers recording and replay; this guide picks up where that one ends.
Contents
- Concepts at a glance
- Installation
- Quick start
- The data model
- Loading and inspection
- Scene-state and event sidecars (v1.2+)
- Eye-movement classification
- Pupillary cognitive load (LHIPA)
- Data quality metrics
- Per-task analysis recipes (sample experiment)
- Custom experiments
- Post-hoc calibration correction
- Batch processing
- Performance notes
- Troubleshooting
- API reference
- References
Concepts at a glance
A live Eye_lean session writes one to three CSV files into
Logs/ (or, when running from the editor against the bundled sample,
Assets/StreamingAssets/):
| File | Always present | Contents |
|---|---|---|
EyeTracking_<session>.csv |
yes | Per-frame gaze, pupil, head pose, vergence, validity flags. |
EyeTracking_<session>_SceneState.csv |
when scene recording is enabled | Per-frame transforms (pose + active flag) for every Recordable in the scene. |
EyeTracking_<session>_SceneEvents.csv |
when scene recording is enabled | Discrete events (Spawn/Despawn, Config*, ShowInstruction, CountingAnswerSelected, ChangeDetectionFeedback, RandomStateSnapshot, …). |
Plus optional sidecar JSON:
experiment_results_<session>.json— per-trial outcome records from the sample experiment (target acquisition times, counting answers, change-detection accuracy).<HMD>_<timestamp>.json—EyeTrackingProfilesnapshot the calibrator saved. Used for post-hoc calibration correction.
The Python package treats the main CSV as authoritative and the sidecars as optional joins.
Installation
Python 3.9+ is supported.
pip install -e ./eyelean_analysis # core
pip install -e "./eyelean_analysis[test]" # core + pytest + PyWavelets
pip install -e "./eyelean_analysis[all]" # everything (visualization, jupyter, batch, …)
Optional extras declared in pyproject.toml:
| Extra | Adds | Needed for |
|---|---|---|
wavelet |
PyWavelets | LHIPA |
visualization |
matplotlib | The plotting helpers |
batch |
joblib, tqdm | BatchProcessor parallel + progress |
jupyter |
jupyter, jupyterlab, ipywidgets | The example notebooks |
export |
pyreadstat | SPSS export |
test |
pytest, PyWavelets | Running the test suite |
scipy, numpy, pandas are all in the core install.
Quick start
The fastest path is the 9-notebook plug-and-play suite under
notebooks/examples/ — see the
notebook index for what each one covers. Every
notebook auto-discovers your data via one line:
ctx = ela.notebook_context() # arg → env → Logs/ → bundled sample
print(ctx) # see what got resolved
Run on a fresh checkout:
jupyter lab eyelean_analysis/notebooks/examples/01_quickstart.ipynb
The bundled v1.2 sample at
Eye_lean_Unity_Project/Eye_lean/Assets/StreamingAssets/ is the
fallback, so every notebook works end-to-end before you've recorded
anything yourself.
Programmatically without a notebook, five lines get you from a recording to a scanpath plot:
import eyelean_analysis as ela
import matplotlib.pyplot as plt
data = ela.load_eyetracking("Logs/EyeTracking_<session>.csv")
gaze = data.compute_gaze_points(distance=2.0)
ela.create_trajectory_plot(gaze["gaze_x"], gaze["gaze_z"])
plt.show()
The data model
load_eyetracking(path) returns an EyeLeanData wrapper around a
pandas DataFrame. Two key conventions matter when interpreting columns:
Column aliasing
The Unity toolkit has emitted at least two CSV schemas across its
lifetime (Flask-app era and the current Unity-direct era). The loader
maps both to a single canonical schema before handing the DataFrame
back. Canonical names are snake_case
(combined_dir_x, head_pos_y, left_pupil_diameter, …); the full
map is eyelean_analysis.COLUMN_ALIASES.
If standardize_columns=True (the default), the DataFrame columns are
the canonical names. Pass standardize_columns=False to keep the
original column casing.
Coordinate normalization
HeadPos_* and the per-eye *Origin_* columns are written
normalized to the trial-start world position, not in absolute
world space. The normalization origin is recorded in the metadata
header as # CoordinateOrigin: x,y,z. To recover absolute world-space
positions, add that origin back to the normalized columns:
meta = ela.read_csv_metadata("Logs/EyeTracking_<session>.csv")
ox, oy, oz = (float(c) for c in meta["CoordinateOrigin"].split(","))
abs_head_x = data.df["head_pos_x"] + ox
CombinedDir_* is a unit vector in world space, not normalized.
Timestamps
UnityTimestamp is Time.realtimeSinceStartup at sample acquisition,
in seconds since process start. It's monotonic and dense at the eye
tracker's native rate (90 Hz on VIVE Focus Vision). Use
data.get_timestamps() to access it as a NumPy array;
data.get_sample_rate() returns the median-derived rate.
For the full schema, see the top-level DATA_SCHEMA.md.
Loading and inspection
import eyelean_analysis as ela
data = ela.load_eyetracking("Logs/EyeTracking_<session>.csv")
print(data.summary()) # dict: n_samples, sample_rate, phases, …
print(data.duration, "s")
print(data.get_phases()) # ['Recording'] or ['FreeExploration', …]
# Direct DataFrame access for arbitrary slicing
df = data.df
free_explore = df[df["phase"] == "FreeExploration"]
The metadata header above the column row carries information that can't be re-derived from the data alone:
meta = ela.read_csv_metadata("Logs/EyeTracking_<session>.csv")
print(meta.get("Profile", "none")) # which calibration profile
# was active at recording
print(meta["CoordinateOrigin"]) # de-normalization origin
print(meta.get("FileVersion")) # schema version
Use this — not the # Profile: header in the file content — to decide
whether a recording needs post-hoc correction.
Scene-state and event sidecars (v1.2+)
When scene recording is enabled in Unity (default for the sample experiment), each session also writes two sidecars next to the main CSV. The package loads them with the same metadata-header tolerance:
state = ela.load_scene_state("Logs/EyeTracking_<session>_SceneState.csv")
events = ela.load_scene_events(
"Logs/EyeTracking_<session>_SceneEvents.csv",
decode_config=True, # base64-decode the Detail of Config* events
)
Or load both at once given the main CSV path:
state, events = ela.load_scene_sidecars(
"Logs/EyeTracking_<session>.csv",
decode_config=True,
)
state is one row per (frame, recordable) pair. events is one row
per discrete event; the Detail column is free-text for most types
and base64-encoded JSON for Config* events (set decode_config=True
to materialize a Config dict column).
To answer "which Recordable was the participant gazing at on a given
frame?", join the gaze DataFrame with scene-state on Frame:
joined = ela.merge_gaze_with_scene_state(
data.df, state,
object_id=None, # all recordables; or pass a stable id
)
# `joined` has gaze columns + Pos_X/Pos_Y/Pos_Z + ObjectId for every
# (sample, recordable-present-that-frame) pair.
ObjectId is a stable hash from Unity's seeded-id system: it's
identical across replay runs of the same recording, so analyses that
key on an object id are deterministic.
Eye-movement classification
detect_eye_movements runs an I-VT (velocity-threshold) classifier on
angular gaze data. Inputs are angles in degrees, not direction
vectors:
import numpy as np
dx = data.df["combined_dir_x"].to_numpy()
dy = data.df["combined_dir_y"].to_numpy()
dz = data.df["combined_dir_z"].to_numpy()
yaw_deg = np.degrees(np.arctan2(dx, dz))
pitch_deg = -np.degrees(np.arcsin(np.clip(dy, -1.0, 1.0)))
mv = ela.detect_eye_movements(yaw_deg, pitch_deg, data.get_timestamps())
print(len(mv["fixations"]), "fixations,",
len(mv["saccades"]), "saccades")
Each fixation has start_time, end_time, duration, centroid_x,
centroid_y. Each saccade has amplitude, peak_velocity,
mean_velocity. Defaults follow the I-VT literature
(50 deg/s threshold, 75 ms merge window, 100 ms minimum fixation);
override via velocity_threshold=, min_fixation_duration=,
merge_threshold= keyword args.
For attention typing, feed the classified movements into the K-coefficient calculator (Krejtz et al. 2016):
k = ela.calculate_k_coefficient(mv["fixations"], mv["saccades"])
print(k.k_coefficient, k.attention_type) # K and FOCAL/AMBIENT/NEUTRAL
Pupillary cognitive load (LHIPA)
LHIPA (Duchowski et al. 2018) is a wavelet-based cognitive-load index
computed on a continuous pupil-diameter signal. As of v1.3 the on-device
live monitor uses RIPA2 (Jayawardena et al. 2025) instead of LHIPA;
LHIPA remains available for offline analysis through this Python
package. For per-frame on-device cognitive load, read the
LiveLoadIndex column produced by RIPAMonitor
(see docs/RIPA_MONITOR.md).
left = data.df["left_pupil_diameter"].to_numpy()
right = data.df["right_pupil_diameter"].to_numpy()
pupil = np.nanmean([left, right], axis=0)
result = ela.calculate_lhipa(pupil, sample_rate=data.get_sample_rate())
print(result.lhipa, result.is_valid)
Higher LHIPA ≈ higher cognitive load. The metric needs at least
~5 seconds of continuous pupil data; shorter windows return
is_valid=False.
Data quality metrics
calculate_quality_metrics produces a single struct summarizing the
recording's signal quality — useful for batch reports and for
filtering out bad sessions before downstream analysis:
q = ela.calculate_quality_metrics(
timestamps = data.get_timestamps(),
validity = data.df["is_tracking_valid"].to_numpy()
if "is_tracking_valid" in data.df.columns else None,
left_pupil = data.df["left_pupil_diameter"].to_numpy(),
right_pupil = data.df["right_pupil_diameter"].to_numpy(),
left_openness = data.df.get("left_openness"),
right_openness = data.df.get("right_openness"),
)
print(q.validity_percent, q.n_gaps, q.n_blinks, q.quality_score)
quality_score is a 0-100 composite; validity_percent,
sample_rate_stability, n_gaps, n_blinks, and pupil-range
diagnostics are exposed individually.
Per-task analysis recipes (sample experiment)
The bundled SampleExperiment cycles four phases:
- FreeExploration — open look-around in a research room.
- VisualSearch — find target spheres among distractors.
- CountingTask — enumerate target shapes.
- ChangeDetection — flicker paradigm; spot which object moved.
The fastest way to a per-phase report:
report = ela.analyze_sample_experiment(
"Logs/EyeTracking_<session>.csv",
results_json_path="Logs/experiment_results_<session>.json",
)
df = report.to_dataframe() # one row per phase
Missing columns and short phases populate missing_metrics rather
than raising. To go deeper into one phase, slice the DataFrame on
phase and run the per-phase tools.
FreeExploration → fixation map
free = data.df[data.df["phase"] == "FreeExploration"]
yaw = np.degrees(np.arctan2(free["combined_dir_x"], free["combined_dir_z"]))
pitch = -np.degrees(np.arcsin(np.clip(free["combined_dir_y"], -1, 1)))
mv = ela.detect_eye_movements(yaw.values, pitch.values,
free["timestamp"].values)
ela.create_fixation_plot(mv["fixations"]) # bubble plot, area ∝ duration
VisualSearch → target acquisition time
The results JSON records per-trial acquisition times directly. To verify them against the gaze stream, use the events sidecar:
events = ela.load_scene_events(
"Logs/EyeTracking_<session>_SceneEvents.csv",
)
# Each VisualSearch trial emits a Spawn for the target and a Despawn
# when the participant fixates on it long enough.
trial_starts = events[(events["EventType"] == "Spawn")
& events["ObjectId"].str.startswith("vs_target_")]
trial_ends = events[(events["EventType"] == "Despawn")
& events["ObjectId"].str.startswith("vs_target_")]
durations = (trial_ends["T"].values - trial_starts["T"].values)
CountingTask → accuracy
answers = events[events["EventType"] == "CountingAnswerSelected"]
# Detail is the chosen number; correct answer is in the Config payload.
config_row = events[events["EventType"] == "ConfigCounting"].iloc[0]
# When loaded with decode_config=True, config_row["Config"] is a dict.
ChangeDetection → detection latency
feedback = events[events["EventType"] == "ChangeDetectionFeedback"]
# Detail = "Correct" or "Incorrect"; T = when the answer was finalized.
hide = events[events["EventType"] == "ChangeDetectionHideScene"]
shown = events[events["EventType"] == "ChangeDetectionShowChangedScene"]
# Latency: feedback["T"] - shown["T"], paired by trial order.
For more elaborate per-trial joins, scenes are spawned with stable ObjectIds; pair Spawn → Despawn pairs by id within each phase.
Custom experiments
Researchers building experiments on top of Eye_lean don't need to fork the analysis package. The patterns:
Filter to your phase names. phase is a free-text column written
by your experiment controller; once you set it, this works:
mine = data.df[data.df["phase"].isin(["MyPhase1", "MyPhase2"])]
Consume custom CSV columns. SessionRecorder lets you append
extra columns at session start; they reach the DataFrame untouched
(no aliasing). Either reference them by their literal name or extend
the alias map at load time:
loader = ela.EyeLeanLoader(custom_aliases={
"my_metric": ["MyMetric", "myMetric_v2"],
})
data = loader.load("...csv")
Read your own event types. The events sidecar accepts arbitrary
EventType strings; everything except Config* (base64+JSON) is
stored as plain text in Detail. Filter on the type you care about:
mine = events[events["EventType"] == "MyExperimentLandmark"]
Align scene-state to gaze. The seeded-id system makes ObjectIds stable across runs of the same scene, so analyses keyed on a specific recordable are reproducible:
joined = ela.merge_gaze_with_scene_state(
data.df, state, object_id="my_stable_id"
)
Post-hoc calibration correction
If a recording was made before a better calibration profile became
available — or you want to compare the same recording under several
profiles — re-apply a saved EyeTrackingProfile to the CSV:
stats = ela.apply_profile_to_csv(
"Logs/HTC VIVE Focus Vision_<timestamp>.json",
"Logs/EyeTracking_<session>.csv",
"Logs/EyeTracking_<session>_corrected.csv",
)
print(stats["samples_corrected_combined"],
stats["samples_skipped_invalid"])
The math, schema, and the compounding-offsets foot-gun (never
apply correction to an already-corrected CSV — the metadata's
# Profile: line is the canonical "was a profile active at
recording" flag) are documented in
docs/POST_HOC_CORRECTION.md.
Batch processing
BatchProcessor runs the same analysis pipeline across many CSVs,
returning a flat per-file summary suitable for cross-participant
analyses:
summary = ela.process_directory_batch(
"data/recordings/",
pattern="*.csv",
output_path="data/results/summary.csv",
n_workers=4,
)
For more control (custom analyzers, partial pipelines), use the class directly:
proc = ela.BatchProcessor(
n_workers=4,
compute_lhipa=True,
compute_entropy=True,
compute_fixations=True,
)
proc.add_analyzer(lambda d: {"my_metric": d.df["my_col"].mean()})
results = proc.process_directory("data/recordings/", pattern="*.csv")
df = ela.BatchProcessor.results_to_dataframe(results)
The full pipeline runs at ~1× wall-clock (a 5-minute recording takes ~5-10 seconds end-to-end, single-threaded).
Performance notes
- A typical 5-minute recording at 90 Hz is ~27,000 rows. At ~100 KB per row of CSV, that's ~3 MB on disk; pandas loads it in <1 s.
- Scene-state sidecars are larger (one row per recordable per frame). A 4-minute SampleExperiment recording with ~10 active recordables produces ~120,000 state rows; that loads in ~0.3 s.
- LHIPA dominates batch runtime: ~2-3 s per recording for the wavelet
decomposition. Disable it (
compute_lhipa=False) when you don't need it. - Memory ceiling: a single recording's full DataFrame is well under
100 MB; you can hold dozens in memory before pressure becomes a
concern. For larger sweeps, use
process_directory_batchwithn_workers > 1and let it stream the per-file summary to disk. - The fixation/saccade classifier and quality metrics are O(n) in sample count; nothing in the package is intentionally super-linear.
Troubleshooting
Loader warns about mixed-type columns. Some optional columns
(per-frame phase strings, partial validity flags) only fill in once
the experiment has actually started. The first few hundred rows are
NaN, the rest are strings. Pass low_memory=False to silence:
data = ela.load_eyetracking("...csv", low_memory=False)
SceneState.csv exists but is empty. Check the metadata header:
SampleEveryNthFrame controls write rate, and the recorder needs at
least one Recordable in the scene. The bundled
SampleExperimentController auto-registers the right Recordables; for
custom scenes, attach a Recordable component to anything you want
recorded.
Post-hoc correction produces near-zero offsets but the recording
looks miscalibrated. The CSV may already be corrected. Read
read_csv_metadata(path)["Profile"] — if it's anything other than
none, that profile was applied live and a second correction will
compound the offsets.
merge_gaze_with_scene_state produces more rows than the gaze
DataFrame. That's expected when multiple recordables are present in
the scene each frame: the merge is many-to-many on Frame. Pass
object_id= to restrict to a single recordable.
Tests pass on dev machine but fail in CI. Confirm the CI runner
has PyWavelets available — the LHIPA tests skip cleanly without it,
but a few other paths log warnings.
API reference
Top-level eyelean_analysis module exports (all also available via
the subpackages they live in):
Data loading
load_eyetracking(path, **kw)→EyeLeanDataEyeLeanLoader(custom_aliases=None)— class-based loaderEyeLeanData— DataFrame wrapper with accessorsread_csv_metadata(path)→dictof header# Key: valuelinesload_scene_state(path)→pd.DataFrameload_scene_events(path, decode_config=False)→pd.DataFrameload_scene_sidecars(main_csv_path, decode_config=False)→(state_df, events_df)merge_gaze_with_scene_state(gaze_df, state_df, object_id=None, frame_column="frame_number")COLUMN_ALIASES— the canonical→variants map
Notebook bootstrap
notebook_context(csv=None, *, require_sidecars=False, require_profile=False, require_results=False, load=True)→NotebookContextNotebookContext— dataclass withcsv_path,scene_state_path,scene_events_path,profile_path,results_path,repo_root,data,metadata,source
Validation
DataValidator(...),ValidationResult,validate_file(path)
Filters
butterworth_filter(data, cutoff_hz, sample_rate_hz, order=4)savgol_smooth(data, window, order=3)compute_gaze_velocity(x, y, timestamps)
Classification
detect_eye_movements(x, y, t, **thresholds)→{"fixations": [...], "saccades": [...]}classify_fixations(...),classify_saccades(...)Fixation,Saccade— dataclassescalculate_k_coefficient(fixations, saccades)→KCoefficientResultclassify_attention(k)→AttentionTypek_coefficient_timeseries(...)
Metrics
calculate_lhipa(pupil, sample_rate)→LHIPAResultcalculate_gaze_entropy(x, y, ...)→EntropyResultcalculate_quality_metrics(...)→QualityMetrics
Experiments
SAMPLE_EXPERIMENT_PHASESanalyze_sample_experiment(csv, results_json_path=None)→SampleExperimentReportPhaseReport,SampleExperimentReport
Calibration
load_profile(json_path)→EyeTrackingProfileapply_combined_correction(world_dirs, head_quats, correction)apply_profile_to_csv(profile_path, csv_in, csv_out)→dictEyeTrackingProfile,GazeCorrection
Batch
BatchProcessor(...),ProcessingResultprocess_batch(file_paths, output_path=None, **kw)→pd.DataFrameprocess_directory_batch(dir, pattern="*.csv", output_path=None, **kw)→pd.DataFrame
Visualization
create_heatmap(x, y, ...)create_trajectory_plot(x, y, ...)create_timeseries_plot(timestamps, values, ...)create_fixation_plot(fixations, ...)create_pupil_plot(timestamps, left, right, ...)
References
- Duchowski, A. T., Krejtz, K., et al. (2018). The Index of Pupillary Activity: Measuring Cognitive Load vis-à-vis Task Difficulty with Pupil Oscillation. CHI.
- Duchowski, A. T., et al. (2022). Vergence calculation algorithms.
- Krejtz, K., Duchowski, A. T., et al. (2016). Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. PLOS ONE.
- Salvucci, D. D., & Goldberg, J. H. (2000). Identifying fixations and saccades in eye-tracking protocols. ETRA.
License
MIT License. See ../LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eyelean_analysis-1.0.0.tar.gz.
File metadata
- Download URL: eyelean_analysis-1.0.0.tar.gz
- Upload date:
- Size: 93.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
395270d606f93ebc2644515aa80751ea3a8001a3d1c4102530589881551301df
|
|
| MD5 |
b81cfb24a091e2cb87ad8207e8aec934
|
|
| BLAKE2b-256 |
3e6c85ca43d787f2facf0bf1a1d77f1ba573cb3793dec1f4c9d74bcb47a21955
|
File details
Details for the file eyelean_analysis-1.0.0-py3-none-any.whl.
File metadata
- Download URL: eyelean_analysis-1.0.0-py3-none-any.whl
- Upload date:
- Size: 78.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59c3d116772049340781d2a774b5b346332222a110b481832656733fb2f80423
|
|
| MD5 |
8304ed68f92e3f960603cce4f9d60266
|
|
| BLAKE2b-256 |
139ab1e361e5db9b81a24abf65d99ec5d303d372cba15ca269ff49e974e6f328
|