Gaze augmentation stage: attach gaze (yaw/pitch + gaze_vec) using face boxes in upstream JSON.
Project description
gaze-estimation-lib
Minimum Python: >=3.10
gaze-estimation-lib is a modular gaze estimation + JSON augmentation toolkit that attaches gaze predictions to detections containing face boxes.
This is the Gaze Augmentation stage of the Vision Pipeline.
Estimators included:
- l2cs: L2CS-Net backend (face-box driven; no internal detector)
By default,
gaze-estimation-libdoes not write any files. You opt-in to saving JSON, frames, or annotated video via flags.
Vision Pipeline
Original Video (.mp4)
│
▼
detect-lib
(Detection Stage)
│
└── detections.json (det-v1)
│
▼
track-lib
(Tracking + ReID)
│
└── tracked.json (track-v1)
│
▼
detect-face-lib
(Face Augmentation)
│
└── faces.json (face-v1 meta)
│
▼
gaze-estimation-lib
(Gaze Augmentation)
│
└── gaze.json (augmented; gaze-v1 meta)
Note: Each stage consumes the original video + the upstream JSON from the previous stage.
What gaze-estimation-lib expects
gaze-estimation-lib does not run a face detector.
Input JSON must contain:
frames[*].detections[*]- Inside detections:
faceswith validbbox
The parent schema may be:
face-v1det-v1track-v1- or unknown
As long as detections and face boxes exist, normalization will adapt.
Output: augmented JSON (returned + optionally saved)
gaze-estimation-lib returns an augmented JSON payload in-memory that preserves the upstream schema and adds:
gaze_augment: metadata about the estimator + association rules (versioned)detections[*].gaze: minimal gaze payload
What gets attached to a detection
Each gaze entry is intentionally minimal:
yaw(radians)pitch(radians)gaze_vec:[x,y,z]unit vectorface_ind: which face entry was usedorigin:[x,y]pixel location (if available)origin_source:"kpt"or"box"
No redundant or derivable data is stored.
Minimal schema example
{
"schema_version": "track-v1",
"gaze_augment": {
"version": "gaze-v1",
"parent_schema_version": "track-v1",
"estimator": {
"name": "l2cs",
"variant": "resnet50",
"weights": "weights.pkl",
"device": "auto"
}
},
"frames": [
{
"frame_index": 0,
"detections": [
{
"bbox": [100.0, 50.0, 320.0, 240.0],
"faces": [
{
"bbox": [140.0, 70.0, 210.0, 150.0],
"score": 0.98
}
],
"gaze": {
"yaw": -0.12,
"pitch": 0.08,
"gaze_vec": [0.11, -0.08, -0.99],
"face_ind": 0
}
}
]
}
]
}
Returned vs saved
- Returned (always): payload available in memory via
GazeResult.payload - Saved (opt-in):
--json→<run>/gaze.json--frames→<run>/frames/--save-video→<run>/annotated.mp4
If no artifact flags are enabled, nothing is written.
Install with pip (future PyPI)
Requires Python >= 3.10.
pip install gaze-estimation-lib
# Install the L2CS backend (required to run gaze estimation)
pip install "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"
Module import name remains:
import gaze
Installing the L2CS backend (pip)
PyPI packages cannot declare Git/VCS dependencies. The default l2cs backend must be installed separately:
pip install "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"
If you already installed gaze-estimation-lib, you can run the command above at any time to add the backend.
L2CS Weights
Pretrained weights: https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd?usp=sharing
Currently supported variant:
resnet50
If using custom weights, ensure they match the correct L2CS variant.
CLI Usage (pip or installed package)
Global help:
python -m gaze.cli.estimate_gaze -h
List estimators:
python -m gaze.cli.estimate_gaze --list-estimators
List variants:
python -m gaze.cli.estimate_gaze --estimator l2cs --list-variants
Quick Start
python -m gaze.cli.estimate_gaze \
--json-in faces.json \
--video in.mp4 \
--weights weights.pkl
Save artifacts (opt-in)
python -m gaze.cli.estimate_gaze \
--json-in faces.json \
--video in.mp4 \
--weights weights.pkl \
--json \
--frames \
--save-video annotated.mp4 \
--out-dir out --run-name demo
CLI arguments
Required (for running augmentation)
--json-in <path>: Path to the upstream JSON to augment.- Accepts
face-v1,det-v1,track-v1, or unknown schemas as long as the JSON containsframes[*].detections[*].
- Accepts
--video <path>: Path to the original source video used to generate the upstream JSON. Frame order must align.--weights <path>: Path to L2CS weights (.pkl).
Discovery
--list-estimators: Print available gaze estimator backends and exit.--list-variants: Print supported variants for--estimatorand exit.
Estimator selection
--estimator <name>: Gaze estimator backend to use (default:l2cs).--variant <name>: Backend variant (named variant registry).- For
l2cs, this selects the backbone. The pretrained weights linked above currently support onlyresnet50.
- For
Face crop behavior
--expand-face <float>: Expand each face box by this fraction before cropping.- Example:
--expand-face 0.25expands width/height by +25%. - Increase → includes more context (forehead/hair/ears); can improve stability but may include background.
- Decrease → tighter crop; can be sharper but may clip parts of the face.
- Practical range:
0.0–0.35(start around0.2–0.3).
- Example:
Association / filtering
--associate-classes <ids...>: Only attach gaze to detections whoseclass_idis in this list.- Example:
--associate-classes 0(oftenperson). - If omitted,
gaze-libtries to inferclass_name == "person"; if not found, all classes are eligible.
- Example:
--face-index <int>: Which face entry to use per detection.- If set, always uses that index when present.
- If omitted, uses the highest-score face in
faces.
Gaze origin behavior (optional)
--kpt-origin <ids...>: Keypoint indices (fromdetections[*].keypoints) used to compute a gaze origin.- Origin is computed as the mean of the selected keypoints that pass confidence.
- Example:
--kpt-origin 0 1.
--kpt-conf <float>: Minimum keypoint confidence for origin computation (default:0.3).- Increase → fewer keypoints qualify (more robust, but more detections may fall back/skip).
- Decrease → more keypoints qualify (more coverage, but noisier origins).
--fallback: If set, when keypoint-origin is unavailable, fall back to the face box center (preferred) or detection box center.- If not set and
--kpt-originis provided, detections without a valid keypoint-origin are skipped.
- If not set and
If you pass --kpt-origin but the JSON contains no keypoints, gaze-estimation-lib emits a warning and continues.
Artifact saving (all opt-in)
--json: Write augmented JSON to<run>/gaze.json.--frames: Save annotated frames under<run>/frames/.--save-video [name.mp4]: Save an annotated video under<run>/.--out-dir <dir>: Output root used only when saving artifacts (default:out).--run-name <name>: Optional subfolder under--out-dir.--fourcc <fourcc>: FourCC codec for saved video (default:mp4v).--display: Show a live annotated preview (ESC to stop). Does not write files unless saving flags are set.
UX
--no-progress: Disable progress bar.
Python usage (import)
You can use gaze-estimation-lib as a library after installing it.
Quick sanity check
python -c "import gaze; print(gaze.available_estimators()); print(gaze.available_variants('l2cs'))"
Python API reference (keywords)
gaze.estimate_gaze_video(...)
Required
json_in: Path to the upstream JSON.video: Path to the original source video.weights: Path to L2CS weights (.pkl).
Estimator
estimator: Backend name (default:"l2cs").variant: Named variant for the backend.- For L2CS pretrained weights linked above, use
"resnet50".
- For L2CS pretrained weights linked above, use
device:"auto","cpu","mps","cuda","cuda:0", or an index string like"0".expand_face: Expand face crop by fraction (0.0–0.35, start0.2–0.3).
Association / selection
associate_class_ids: List ofclass_idvalues eligible for gaze attachment.- If
None, the tool tries to inferclass_name == "person"; if not found, all classes are eligible.
- If
face_index: If set, use that face index per detection; otherwise choose the highest-score face.
Origin (optional)
kpt_origin: List of keypoint indices used to compute gaze origin.kpt_conf: Keypoint confidence threshold.fallback: IfTrue, fall back to box center when keypoint-origin is unavailable.
Artifacts (all off by default)
save_json_flag: Write<run>/gaze.json.save_frames: Write<run>/frames/*.jpg.save_video: Filename for annotated video under the run folder.out_dir,run_name,fourcc,display,no_progress.
Returns a GazeResult with payload (augmented JSON), paths (only populated when saving), and stats.
Run gaze augmentation from a Python file
Create run_gaze.py:
from gaze import estimate_gaze_video
res = estimate_gaze_video(
json_in="faces.json",
video="in.mp4",
estimator="l2cs",
variant="resnet50",
weights="weights.pkl",
device="auto",
# Optional filtering
associate_class_ids=[0],
# Optional crop tuning
expand_face=0.25,
# Optional origin behavior
kpt_origin=[0, 1],
kpt_conf=0.3,
fallback=True,
# Opt-in artifacts
save_json_flag=True,
save_video="annotated.mp4",
out_dir="out",
run_name="demo",
)
print(res.stats)
print("gaze_augment" in res.payload)
print(res.paths) # populated only if you enable saving artifacts
Run:
python run_gaze.py
Using uv (recommended for development)
Install uv: https://docs.astral.sh/uv/
Clone the repo:
git clone https://github.com/Surya-Rayala/VisionPipeline-gaze.git
cd VisionPipeline-gaze
Sync environment:
uv sync
Installing the L2CS backend (uv)
Add the backend to your local uv environment from Git:
uv add --dev "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"
uv sync
Note: this updates your local project environment; it is intended for development/use in this repo.
Run CLI:
uv run python -m gaze.cli.estimate_gaze -h
Run augmentation:
uv run python -m gaze.cli.estimate_gaze \
--json-in faces.json \
--video in.mp4 \
--weights weights.pkl
Testing different help levels
Basic help:
uv run python -m gaze.cli.estimate_gaze -h
Estimator help:
uv run python -m gaze.cli.estimate_gaze --list-estimators
Variant help:
uv run python -m gaze.cli.estimate_gaze --estimator l2cs --list-variants
License
This project is licensed under the MIT License. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gaze_estimation_lib-0.1.3.tar.gz.
File metadata
- Download URL: gaze_estimation_lib-0.1.3.tar.gz
- Upload date:
- Size: 126.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
273cc68378329d60cb53cc7126282dcf07897b33bb75d8e13e7ad6dc06f14846
|
|
| MD5 |
3b48d80d062765c8d57df61098f4b7c5
|
|
| BLAKE2b-256 |
20dc6c617d3db3ce59d6cb00daed0d063c3c9888a27634be2ed6ec717b630a7a
|
File details
Details for the file gaze_estimation_lib-0.1.3-py3-none-any.whl.
File metadata
- Download URL: gaze_estimation_lib-0.1.3-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a187654ba85daa25c946ef48652a4cce07597e9fb46e4812708fea8607d8079f
|
|
| MD5 |
097a85ac352a88622902be3121f64f8f
|
|
| BLAKE2b-256 |
2ca07a6488da079775eb99fea845b4b78a053d02999ee9f72a00252ac20af355
|