Gaze augmentation stage: attach gaze (yaw/pitch + gaze_vec) using face boxes in upstream JSON.

Project description

gaze-estimation-lib

Minimum Python: >=3.10

gaze-estimation-lib is a modular gaze estimation + JSON augmentation toolkit that attaches gaze predictions to detections containing face boxes.

This is the Gaze Augmentation stage of the Vision Pipeline.

Estimators included:

l2cs: L2CS-Net backend (face-box driven; no internal detector)

By default, gaze-estimation-lib does not write any files. You opt-in to saving JSON, frames, or annotated video via flags.

Vision Pipeline

Original Video (.mp4)
        │
        ▼
  detect-lib
  (Detection Stage)
        │
        └── detections.json (det-v1)
                   │
                   ▼
                track-lib
           (Tracking + ReID)
                   │
                   └── tracked.json (track-v1)
                           │
                           ▼
                    detect-face-lib
                 (Face Augmentation)
                           │
                           └── faces.json (face-v1 meta)
                                   │
                                   ▼
                              gaze-estimation-lib
                         (Gaze Augmentation)
                                   │
                                   └── gaze.json (augmented; gaze-v1 meta)

Stage 1 (Detection):

Stage 2 (Tracking + ReID):

Stage 3 (Face Augmentation):

Note: Each stage consumes the original video + the upstream JSON from the previous stage.

What gaze-estimation-lib expects

gaze-estimation-lib does not run a face detector.

Input JSON must contain:

frames[*].detections[*]
Inside detections: faces with valid bbox

The parent schema may be:

face-v1
det-v1
track-v1
or unknown

As long as detections and face boxes exist, normalization will adapt.

Output: augmented JSON (returned + optionally saved)

gaze-estimation-lib returns an augmented JSON payload in-memory that preserves the upstream schema and adds:

gaze_augment: metadata about the estimator + association rules (versioned)
detections[*].gaze: minimal gaze payload

What gets attached to a detection

Each gaze entry is intentionally minimal:

yaw (radians)
pitch (radians)
gaze_vec: [x,y,z] unit vector
face_ind: which face entry was used
origin: [x,y] pixel location (if available)
origin_source: "kpt" or "box"

No redundant or derivable data is stored.

Minimal schema example

{
  "schema_version": "track-v1",
  "gaze_augment": {
    "version": "gaze-v1",
    "parent_schema_version": "track-v1",
    "estimator": {
      "name": "l2cs",
      "variant": "resnet50",
      "weights": "weights.pkl",
      "device": "auto",
      "preproc_workers": 0,
      "preproc_thread_threshold": 4,
      "gpu_preprocess": "auto"
    }
  },
  "frames": [
    {
      "frame_index": 0,
      "detections": [
        {
          "bbox": [100.0, 50.0, 320.0, 240.0],
          "faces": [
            {
              "bbox": [140.0, 70.0, 210.0, 150.0],
              "score": 0.98
            }
          ],
          "gaze": {
            "yaw": -0.12,
            "pitch": 0.08,
            "gaze_vec": [0.11, -0.08, -0.99],
            "face_ind": 0
          }
        }
      ]
    }
  ]
}

Returned vs saved

Returned (always): payload available in memory via GazeResult.payload
Saved (opt-in):
- --json → <run>/gaze.json
- --frames → <run>/frames/
- --save-video → <run>/annotated.mp4

If no artifact flags are enabled, nothing is written.

Install with pip (future PyPI)

Requires Python >= 3.10.

pip install gaze-estimation-lib

# Install the L2CS backend (required to run gaze estimation)
pip install "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"

Module import name remains:

import gaze

Installing the L2CS backend (pip)

PyPI packages cannot declare Git/VCS dependencies. The default l2cs backend must be installed separately:

pip install "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"

If you already installed gaze-estimation-lib, you can run the command above at any time to add the backend.

L2CS Weights

Pretrained weights: https://drive.google.com/drive/folders/17p6ORr-JQJcw-eYtG2WGNiuS_qVKwdWd?usp=sharing

Currently supported variant:

resnet50

If using custom weights, ensure they match the correct L2CS variant.

CLI Usage (pip or installed package)

Global help:

python -m gaze.cli.estimate_gaze -h

List estimators:

python -m gaze.cli.estimate_gaze --list-estimators

List variants:

python -m gaze.cli.estimate_gaze --estimator l2cs --list-variants

Quick Start

python -m gaze.cli.estimate_gaze \
  --json-in faces.json \
  --video in.mp4 \
  --weights weights.pkl

Save artifacts (opt-in)

python -m gaze.cli.estimate_gaze \
  --json-in faces.json \
  --video in.mp4 \
  --weights weights.pkl \
  --json \
  --frames \
  --save-video annotated.mp4 \
  --out-dir out --run-name demo

CLI arguments

Required (for running augmentation)

--json-in <path>: Path to the upstream JSON to augment.
- Accepts face-v1, det-v1, track-v1, or unknown schemas as long as the JSON contains frames[*].detections[*].
--video <path>: Path to the original source video used to generate the upstream JSON. Frame order must align.
--weights <path>: Path to L2CS weights (.pkl).

Discovery

--list-estimators: Print available gaze estimator backends and exit.
--list-variants: Print supported variants for --estimator and exit.

Estimator selection

--estimator <name>: Gaze estimator backend to use (default: l2cs).
--variant <name>: Backend variant (named variant registry).
- For l2cs, this selects the backbone. The pretrained weights linked above currently support only resnet50.

Face crop behavior

--expand-face <float>: Expand each face box by this fraction before cropping.
- Example: --expand-face 0.25 expands width/height by +25%.
- Increase → includes more context (forehead/hair/ears); can improve stability but may include background.
- Decrease → tighter crop; can be sharper but may clip parts of the face.
- Practical range: 0.0–0.35 (start around 0.2–0.3).

Performance / preprocessing (optional)

These flags do not change what the tool outputs (same gaze fields attached to detections). They only control how face crops are prepared before running the estimator.

--preproc-workers <int>: CPU threads used for face crop + resize preprocessing.
- 0 (default) → auto (uses a sensible thread count for your machine; capped to avoid oversubscription)
- >0 → use exactly that many threads
- <0 → disable threading (force serial preprocessing)
- When to use:
  - If CUDA/MPS inference is fast but the run is still slow, you are likely CPU-bound on crop/resize.
  - Helps most when you have multiple faces per frame or high-resolution frames.
- Example: --preproc-workers 8
--preproc-thread-threshold <int>: Minimum faces per frame before threaded preprocessing is used.
- Default: 4
- Lower it (e.g., 1 or 2) if most frames have only 1–2 faces and you still want threading.
- Example: --preproc-thread-threshold 2
--gpu-preprocess: Force GPU crop + resize preprocessing when running on CUDA.
- Internally uses torchvision.ops.roi_align for fast ROI crop/resize.
- If not provided, the library stays in auto mode:
  - on CUDA: it will try GPU preprocessing when dependencies are available
  - otherwise: it uses the CPU preprocessing path
- Notes:
  - Requires torchvision to be installed.
  - Depending on the installed l2cs version, the pipeline may fall back to CPU preprocessing if it cannot accept GPU tensor chips.
- Example: --device cuda:0 --gpu-preprocess

Association / filtering

--associate-classes <ids...>: Only attach gaze to detections whose class_id is in this list.
- Example: --associate-classes 0 (often person).
- If omitted, gaze-lib tries to infer class_name == "person"; if not found, all classes are eligible.
--face-index <int>: Which face entry to use per detection.
- If set, always uses that index when present.
- If omitted, uses the highest-score face in faces.

Gaze origin behavior (optional)

--kpt-origin <ids...>: Keypoint indices (from detections[*].keypoints) used to compute a gaze origin.
- Origin is computed as the mean of the selected keypoints that pass confidence.
- Example: --kpt-origin 0 1.
--kpt-conf <float>: Minimum keypoint confidence for origin computation (default: 0.3).
- Increase → fewer keypoints qualify (more robust, but more detections may fall back/skip).
- Decrease → more keypoints qualify (more coverage, but noisier origins).
--fallback: If set, when keypoint-origin is unavailable, fall back to the face box center (preferred) or detection box center.
- If not set and --kpt-origin is provided, detections without a valid keypoint-origin are skipped.

If you pass --kpt-origin but the JSON contains no keypoints, gaze-estimation-lib emits a warning and continues.

Artifact saving (all opt-in)

--json: Write augmented JSON to <run>/gaze.json.
--frames: Save annotated frames under <run>/frames/.
--save-video [name.mp4]: Save an annotated video under <run>/.
--out-dir <dir>: Output root used only when saving artifacts (default: out).
--run-name <name>: Optional subfolder under --out-dir.
--fourcc <fourcc>: FourCC codec for saved video (default: mp4v).
--display: Show a live annotated preview (ESC to stop). Does not write files unless saving flags are set.

UX

--no-progress: Disable progress bar.

Python usage (import)

You can use gaze-estimation-lib as a library after installing it.

Quick sanity check

python -c "import gaze; print(gaze.available_estimators()); print(gaze.available_variants('l2cs'))"

Python API reference (keywords)

`gaze.estimate_gaze_video(...)`

Required

json_in: Path to the upstream JSON.
video: Path to the original source video.
weights: Path to L2CS weights (.pkl).

Estimator

estimator: Backend name (default: "l2cs").
variant: Named variant for the backend.
- For L2CS pretrained weights linked above, use "resnet50".
device: "auto", "cpu", "mps", "cuda", "cuda:0", or an index string like "0".
expand_face: Expand face crop by fraction (0.0–0.35, start 0.2–0.3).
preproc_workers: CPU threads used for face crop + resize preprocessing.
- 0 (default) → auto
- >0 → fixed thread count
- <0 → disable threading
preproc_thread_threshold: Minimum faces per frame before threaded preprocessing is used (default: 4).
gpu_preprocess: GPU crop+resize control for CUDA.
- Values: "auto" (default), "on", "off"
- "auto": try GPU preprocessing on CUDA when available; otherwise use CPU
- "on": force GPU preprocessing (may raise if unsupported)
- "off": disable GPU preprocessing

Association / selection

associate_class_ids: List of class_id values eligible for gaze attachment.
- If None, the tool tries to infer class_name == "person"; if not found, all classes are eligible.
face_index: If set, use that face index per detection; otherwise choose the highest-score face.

Origin (optional)

kpt_origin: List of keypoint indices used to compute gaze origin.
kpt_conf: Keypoint confidence threshold.
fallback: If True, fall back to box center when keypoint-origin is unavailable.

Artifacts (all off by default)

save_json_flag: Write <run>/gaze.json.
save_frames: Write <run>/frames/*.jpg.
save_video: Filename for annotated video under the run folder.
out_dir, run_name, fourcc, display, no_progress.

Returns a GazeResult with payload (augmented JSON), paths (only populated when saving), and stats.

Run gaze augmentation from a Python file

Create run_gaze.py:

from gaze import estimate_gaze_video

res = estimate_gaze_video(
    json_in="faces.json",
    video="in.mp4",
    estimator="l2cs",
    variant="resnet50",
    weights="weights.pkl",
    device="auto",

    # Optional filtering
    associate_class_ids=[0],

    # Optional crop tuning
    expand_face=0.25,

    # Performance (optional)
    preproc_workers=0,              # 0=auto, >0 threads, <0 disable threading
    preproc_thread_threshold=4,     # faces/frame threshold for threading
    gpu_preprocess="auto",          # "auto" | "on" | "off" (CUDA only)

    # Optional origin behavior
    kpt_origin=[0, 1],
    kpt_conf=0.3,
    fallback=True,

    # Opt-in artifacts
    save_json_flag=True,
    save_video="annotated.mp4",
    out_dir="out",
    run_name="demo",
)

print(res.stats)
print("gaze_augment" in res.payload)
print(res.paths)  # populated only if you enable saving artifacts

Run:

python run_gaze.py

Using uv (recommended for development)

Install uv: https://docs.astral.sh/uv/

Clone the repo:

git clone https://github.com/Surya-Rayala/VisionPipeline-gaze.git
cd VisionPipeline-gaze

Sync environment:

uv sync

Installing the L2CS backend (uv)

Add the backend to your local uv environment from Git:

uv add --dev "l2cs @ git+https://github.com/edavalosanaya/L2CS-Net.git@main"
uv sync

Note: this updates your local project environment; it is intended for development/use in this repo.

Run CLI:

uv run python -m gaze.cli.estimate_gaze -h

Run augmentation:

uv run python -m gaze.cli.estimate_gaze \
  --json-in faces.json \
  --video in.mp4 \
  --weights weights.pkl

Testing different help levels

Basic help:

uv run python -m gaze.cli.estimate_gaze -h

Estimator help:

uv run python -m gaze.cli.estimate_gaze --list-estimators

Variant help:

uv run python -m gaze.cli.estimate_gaze --estimator l2cs --list-variants

License

This project is licensed under the MIT License. See LICENSE.

Project details

Release history Release notifications | RSS feed

0.1.7

Feb 20, 2026

0.1.6

Feb 18, 2026

This version

0.1.5

Feb 17, 2026

0.1.4

Feb 17, 2026

0.1.3

Feb 17, 2026

0.1.2

Feb 17, 2026

0.1.1

Feb 17, 2026

0.1.0

Feb 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaze_estimation_lib-0.1.5.tar.gz (32.3 kB view details)

Uploaded Feb 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gaze_estimation_lib-0.1.5-py3-none-any.whl (31.7 kB view details)

Uploaded Feb 17, 2026 Python 3

File details

Details for the file gaze_estimation_lib-0.1.5.tar.gz.

File metadata

Download URL: gaze_estimation_lib-0.1.5.tar.gz
Upload date: Feb 17, 2026
Size: 32.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gaze_estimation_lib-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`c41fdfb9955a69ec22b67744bda30652d834dead89d3808f81f96926260a7fe9`
MD5	`98d0a99f1b112fa2969a1b6b553b789a`
BLAKE2b-256	`8d0e095e0a85ffa0760e9b49d8e37cee9d15bfe40b09ad484adc515d5d8e3370`

See more details on using hashes here.

File details

Details for the file gaze_estimation_lib-0.1.5-py3-none-any.whl.

File metadata

Download URL: gaze_estimation_lib-0.1.5-py3-none-any.whl
Upload date: Feb 17, 2026
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.0 {"installer":{"name":"uv","version":"0.10.0","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for gaze_estimation_lib-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b892d9486393953bc4bff8981e765a202309f2a12d589c3e61dc0938d6d3912a`
MD5	`b286f716debcfafa07d8facce0533bb3`
BLAKE2b-256	`5aee482ff71f2d0d2d55cd584daeda3ea85c283cbdfa17b2f32c482d5a11f5da`

See more details on using hashes here.

gaze-estimation-lib 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

gaze-estimation-lib

Vision Pipeline

What gaze-estimation-lib expects

Output: augmented JSON (returned + optionally saved)

What gets attached to a detection

Minimal schema example

Returned vs saved

Install with pip (future PyPI)

Installing the L2CS backend (pip)

L2CS Weights

CLI Usage (pip or installed package)

Quick Start

Save artifacts (opt-in)

CLI arguments

Required (for running augmentation)

Discovery

Estimator selection

Face crop behavior

Performance / preprocessing (optional)

Association / filtering

Gaze origin behavior (optional)

Artifact saving (all opt-in)

UX

Python usage (import)

Quick sanity check

Python API reference (keywords)

gaze.estimate_gaze_video(...)

Run gaze augmentation from a Python file

Using uv (recommended for development)

Installing the L2CS backend (uv)

Testing different help levels

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`gaze.estimate_gaze_video(...)`