Efficient on-edge V-JEPA 2.x video encoder with a streaming/causal R&D track for temporal video understanding.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khushiyant

These details have not been verified by PyPI

Project description

Saccade

Most of a video is redundant (a hallway camera sees nearly the same frame thousands of times), yet a video model normally pays full price for every frame. That makes running V-JEPA 2 live on edge hardware (cameras, robots, on-device apps) expensive.

Saccade fixes that by spending compute only on what changes, the way your eyes spend detail on fixations and predict across the jumps in between (a saccade). It turns a frozen V-JEPA 2 encoder into a streaming model that keeps a live embedding cheaply.

How it works

Saccade adds two things to V-JEPA's encoder:

A streaming, causal encoder. Attention is made block-causal and backed by a per-layer KV-cache, so a new frame is encoded once and reuses cached history instead of re-running a sliding window. V-JEPA 2's 3D rotary embeddings are ported into the causal path, so the cached step reproduces the original full-attention output exactly, not an approximation.
Surprise-gating. A cheap novelty test (the encoder's own patch-embedding front-end) decides whether an incoming clip is actually new. Predictable clips skip the transformer and reuse the last representation; only real changes pay full price. Compute follows the scene, not the clock.

Surprise-gated streaming encoder

Left: the learned gate holds fidelity far better than a pixel-difference gate as compute drops. Right: encoder compute auto-scales from ~2% on static video to 100% on fast motion.

Around that core sits a measured edge toolkit: post-training quantization (int8/int4), token reduction (ToMe, PruneVid-style temporal merge), fused attention + torch.compile, distillation to a smaller ViT-S student, ONNX export, and an async decode-and-infer pipeline.

Results

Measured on an RTX 5070 Ti, fp16, batch 1. A single GPU, not the Jetson target, so read these as a correctness check and an upper bound on edge performance.

Efficiency and streaming

What	Result
ViT-L encoder (16f @256)	95.7 ms, 10.5 embeds/s, 738 MB
Surprise-gated streaming (real video)	84% of embeddings skipped, 5.7x faster
Streaming per-frame update	22.8 ms vs 188.8 ms full re-encode = 8.3x
RoPE-port correctness	causal attention matches HF to rel 0.001; cache step exact (0.0)
Fused attention	SDPA 3.94x, SDPA + torch.compile 4.53x vs eager (374 -> 82 ms)
int8 quantization	30% less memory at cosine 0.9999
Token reduction	1.3x to 2.3x speedup (accuracy/speed knob)
ONNX export	exact parity vs PyTorch (cosine 1.00000)

Install

Requires Python 3.10+ and a CUDA GPU.

With uv (recommended; pulls the cu128 torch build for Blackwell/sm_120 automatically, configured in pyproject.toml):

uv sync                # creates .venv and installs deps (incl. cu128 torch)
uv sync --extra dev    # add test + figure tooling (pytest, matplotlib, seaborn)

With pip (the PyPI name is saccadic; it imports as saccade):

pip install saccadic                                    # latest release from PyPI
pip install git+https://github.com/Khushiyant/saccade   # bleeding edge from main
# or, from a clone, for development:
pip install -e .

On recent GPUs (Blackwell/sm_120) install the matching torch first, so pip does not pull an incompatible build: pip install torch --index-url https://download.pytorch.org/whl/cu128. On Jetson, use the JetPack-provided torch/decord/tensorrt wheels.

Usage

Saccade is a library: it turns video into embeddings you feed to your own head (a classifier, retrieval index, anomaly score). Pick the mode that matches your input.

One-shot, when you have a clip and want its embedding:

import torch
from saccade import load_encoder, ModelConfig

enc = load_encoder(ModelConfig(checkpoint="vitl", frames=16, resolution=256,
                               device="cuda", dtype="float16"))
clip = torch.rand(1, 16, 3, 256, 256, device="cuda", dtype=torch.float16)  # [B,T,C,H,W]
emb = enc.embed(clip)        # [1, 1024] -> feed to your task head

Surprise-gated streaming, when you have a live feed and want to skip redundant clips (the efficiency win: ~84% of clips skipped on real footage):

from saccade import SurpriseGatedEncoder

gate = SurpriseGatedEncoder(enc, tau=0.015)   # tau is the compute/fidelity knob
gate.reset()
for clip in stream:                           # each clip: [1, T, 3, H, W]
    emb, info = gate.step(clip)
    if info["encoded"]:                       # False -> scene unchanged, last emb reused
        my_head(emb)                          # only run downstream work when it is new

Exact causal streaming, when you want a per-frame running embedding backed by a KV-cache:

from saccade import StreamingEncoder, StreamingConfig

stream = StreamingEncoder(enc, StreamingConfig())
stream.reset()
for frame in frames:                          # each frame: [3, H, W]
    emb = stream.step(frame)                  # emits a 1024-d embedding once a tubelet completes

To finetune the causal adapter on your own video, apply_causal_lora(enc, StreamingConfig()) converts the encoder in place and returns the trainable LoRA parameters.

In every mode emb is a 1024-d vector; attach your own linear probe, retrieval, or threshold on top. Saccade gives you the cheap live representation, the task head is yours.

Reproduce

The numbers above come from these scripts (run on an RTX 5070 Ti):

uv run python scripts/real_eval.py            # encoder latency + streaming
uv run python scripts/bench_fused_attn.py     # eager vs SDPA vs torch.compile
uv run python scripts/bench_surprise_gate.py  # surprise-gating Pareto
uv run python scripts/verify_rope.py          # RoPE-port correctness checks
uv run python scripts/make_figures.py         # render result figures
uv run python scripts/demo.py --video clip.mp4 --stride 4 --tau 0.015  # annotated demo
uv run pytest                                 # unit tests

Layout: the library lives in src/saccade/ (with streaming/ for the causal attention, KV-cache, LoRA-to-causal, streaming encoder and surprise gate); scripts/ holds the benchmarks and demo; tests/ the unit tests; configs/ example run configs.

Status and limitations

Measured and verified:

Encoder latency/throughput/memory, fused attention, token reduction, int8 quantization.
Streaming: the KV-cache step reproduces masked full attention exactly; the ported 3D-RoPE matches the reference encoder to ~0.1%.
37 unit tests pass (core correctness plus the novel features); distillation and the robustness finetune train on the real model.

Not yet done (needs external resources, not code):

Task accuracy. SSv2 top-1 has not been run (the dataset is gated). The probe train/eval harness works on a synthetic proxy; there are no accuracy-vs-SOTA numbers yet.
On-device. Only a single GPU was used; Jetson latency and a TensorRT engine still need the actual device.
Streaming accuracy. The causal encoder is numerically exact through the cache, but a LoRA finetune on real video is still needed to close the across-depth causal-vs-bidirectional gap.

References

V-JEPA 2, Assran et al., 2025 (arXiv:2506.09985). Checkpoints facebook/vjepa2-* on Hugging Face; Saccade loads facebook/vjepa2-vitl-fpc64-256 by default.
Closest streaming prior art: VL-JEPA, OmniStream, Recurrent Video MAE, CarelessWhisper.

License

MIT, see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

khushiyant

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jun 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

saccadic-0.1.0.tar.gz (78.1 kB view details)

Uploaded Jun 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

saccadic-0.1.0-py3-none-any.whl (78.6 kB view details)

Uploaded Jun 30, 2026 Python 3

File details

Details for the file saccadic-0.1.0.tar.gz.

File metadata

Download URL: saccadic-0.1.0.tar.gz
Upload date: Jun 30, 2026
Size: 78.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saccadic-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5fcd45b30a8bce70ab21c787bbf60f4c567cf79168539cb2cc49a05feb091864`
MD5	`0b94d35a9b844ae05f104368d77bc815`
BLAKE2b-256	`2583f1a92852342482cd0bc0ff13e343f993272c1a77eb1921137a31d98a889e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saccadic-0.1.0.tar.gz:

Publisher: release.yml on Khushiyant/saccade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saccadic-0.1.0.tar.gz
- Subject digest: 5fcd45b30a8bce70ab21c787bbf60f4c567cf79168539cb2cc49a05feb091864
- Sigstore transparency entry: 2020019376
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: Khushiyant/saccade@b1308e961debafd3a084f1e0cbdea8769fa96c25
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Khushiyant
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b1308e961debafd3a084f1e0cbdea8769fa96c25
- Trigger Event: push

File details

Details for the file saccadic-0.1.0-py3-none-any.whl.

File metadata

Download URL: saccadic-0.1.0-py3-none-any.whl
Upload date: Jun 30, 2026
Size: 78.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for saccadic-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f771040fbd709ef26c4d69fe6744471f244d4b58b1cdab73ea900f2f5321ae97`
MD5	`2d8b775335d604fcb1dfb743aa17d3f2`
BLAKE2b-256	`9f955045220b2d03ff0ec9d2bf319a8ad8a89c081b40ad207575eee8625c508f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for saccadic-0.1.0-py3-none-any.whl:

Publisher: release.yml on Khushiyant/saccade

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: saccadic-0.1.0-py3-none-any.whl
- Subject digest: f771040fbd709ef26c4d69fe6744471f244d4b58b1cdab73ea900f2f5321ae97
- Sigstore transparency entry: 2020019569
- Sigstore integration time: Jun 30, 2026
Source repository:
- Permalink: Khushiyant/saccade@b1308e961debafd3a084f1e0cbdea8769fa96c25
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Khushiyant
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@b1308e961debafd3a084f1e0cbdea8769fa96c25
- Trigger Event: push

saccadic 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Saccade

How it works

Results

Install

Usage

Reproduce

Status and limitations

References

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance