Efficient on-edge V-JEPA 2.x video encoder with a streaming/causal R&D track for temporal video understanding.
Project description
Saccade
Most of a video is redundant (a hallway camera sees nearly the same frame thousands of times), yet a video model normally pays full price for every frame. That makes running V-JEPA 2 live on edge hardware (cameras, robots, on-device apps) expensive.
Saccade fixes that by spending compute only on what changes, the way your eyes spend detail on fixations and predict across the jumps in between (a saccade). It turns a frozen V-JEPA 2 encoder into a streaming model that keeps a live embedding cheaply.
How it works
Saccade adds two things to V-JEPA's encoder:
- A streaming, causal encoder. Attention is made block-causal and backed by a per-layer KV-cache, so a new frame is encoded once and reuses cached history instead of re-running a sliding window. V-JEPA 2's 3D rotary embeddings are ported into the causal path, so the cached step reproduces the original full-attention output exactly, not an approximation.
- Surprise-gating. A cheap novelty test (the encoder's own patch-embedding front-end) decides whether an incoming clip is actually new. Predictable clips skip the transformer and reuse the last representation; only real changes pay full price. Compute follows the scene, not the clock.
Left: the learned gate holds fidelity far better than a pixel-difference gate as compute drops. Right: encoder compute auto-scales from ~2% on static video to 100% on fast motion.
Around that core sits a measured edge toolkit: post-training quantization (int8/int4), token
reduction (ToMe, PruneVid-style temporal merge), fused attention + torch.compile, distillation
to a smaller ViT-S student, ONNX export, and an async decode-and-infer pipeline.
Results
Measured on an RTX 5070 Ti, fp16, batch 1. A single GPU, not the Jetson target, so read these as a correctness check and an upper bound on edge performance.
| What | Result |
|---|---|
| ViT-L encoder (16f @256) | 95.7 ms, 10.5 embeds/s, 738 MB |
| Surprise-gated streaming (real video) | 84% of embeddings skipped, 5.7x faster |
| Streaming per-frame update | 22.8 ms vs 188.8 ms full re-encode = 8.3x |
| RoPE-port correctness | causal attention matches HF to rel 0.001; cache step exact (0.0) |
| Fused attention | SDPA 3.94x, SDPA + torch.compile 4.53x vs eager (374 -> 82 ms) |
| int8 quantization | 30% less memory at cosine 0.9999 |
| Token reduction | 1.3x to 2.3x speedup (accuracy/speed knob) |
| ONNX export | exact parity vs PyTorch (cosine 1.00000) |
Install
Requires Python 3.10+ and a CUDA GPU.
With uv (recommended; pulls the cu128 torch build for
Blackwell/sm_120 automatically, configured in pyproject.toml):
uv sync # creates .venv and installs deps (incl. cu128 torch)
uv sync --extra dev # add test + figure tooling (pytest, matplotlib, seaborn)
With pip (the PyPI name is saccadic; it imports as saccade):
pip install saccadic # latest release from PyPI
pip install git+https://github.com/Khushiyant/saccade # bleeding edge from main
# or, from a clone, for development:
pip install -e .
On recent GPUs (Blackwell/sm_120) install the matching torch first, so pip does not pull an
incompatible build: pip install torch --index-url https://download.pytorch.org/whl/cu128.
On Jetson, use the JetPack-provided torch/decord/tensorrt wheels.
Usage
Saccade is a library: it turns video into embeddings you feed to your own head (a classifier, retrieval index, anomaly score). Pick the mode that matches your input.
One-shot, when you have a clip and want its embedding:
import torch
from saccade import load_encoder, ModelConfig
enc = load_encoder(ModelConfig(checkpoint="vitl", frames=16, resolution=256,
device="cuda", dtype="float16"))
clip = torch.rand(1, 16, 3, 256, 256, device="cuda", dtype=torch.float16) # [B,T,C,H,W]
emb = enc.embed(clip) # [1, 1024] -> feed to your task head
Surprise-gated streaming, when you have a live feed and want to skip redundant clips (the efficiency win: ~84% of clips skipped on real footage):
from saccade import SurpriseGatedEncoder
gate = SurpriseGatedEncoder(enc, tau=0.015) # tau is the compute/fidelity knob
gate.reset()
for clip in stream: # each clip: [1, T, 3, H, W]
emb, info = gate.step(clip)
if info["encoded"]: # False -> scene unchanged, last emb reused
my_head(emb) # only run downstream work when it is new
Exact causal streaming, when you want a per-frame running embedding backed by a KV-cache:
from saccade import StreamingEncoder, StreamingConfig
stream = StreamingEncoder(enc, StreamingConfig())
stream.reset()
for frame in frames: # each frame: [3, H, W]
emb = stream.step(frame) # emits a 1024-d embedding once a tubelet completes
To finetune the causal adapter on your own video, apply_causal_lora(enc, StreamingConfig())
converts the encoder in place and returns the trainable LoRA parameters.
In every mode emb is a 1024-d vector; attach your own linear probe, retrieval, or threshold on
top. Saccade gives you the cheap live representation, the task head is yours.
Reproduce
The numbers above come from these scripts (run on an RTX 5070 Ti):
uv run python scripts/real_eval.py # encoder latency + streaming
uv run python scripts/bench_fused_attn.py # eager vs SDPA vs torch.compile
uv run python scripts/bench_surprise_gate.py # surprise-gating Pareto
uv run python scripts/verify_rope.py # RoPE-port correctness checks
uv run python scripts/make_figures.py # render result figures
uv run python scripts/demo.py --video clip.mp4 --stride 4 --tau 0.015 # annotated demo
uv run pytest # unit tests
Layout: the library lives in src/saccade/ (with streaming/ for the causal attention, KV-cache,
LoRA-to-causal, streaming encoder and surprise gate); scripts/ holds the benchmarks and demo;
tests/ the unit tests; configs/ example run configs.
Status and limitations
Measured and verified:
- Encoder latency/throughput/memory, fused attention, token reduction, int8 quantization.
- Streaming: the KV-cache step reproduces masked full attention exactly; the ported 3D-RoPE matches the reference encoder to ~0.1%.
- 37 unit tests pass (core correctness plus the novel features); distillation and the robustness finetune train on the real model.
Not yet done (needs external resources, not code):
- Task accuracy. SSv2 top-1 has not been run (the dataset is gated). The probe train/eval harness works on a synthetic proxy; there are no accuracy-vs-SOTA numbers yet.
- On-device. Only a single GPU was used; Jetson latency and a TensorRT engine still need the actual device.
- Streaming accuracy. The causal encoder is numerically exact through the cache, but a LoRA finetune on real video is still needed to close the across-depth causal-vs-bidirectional gap.
References
- V-JEPA 2, Assran et al., 2025 (arXiv:2506.09985). Checkpoints
facebook/vjepa2-*on Hugging Face; Saccade loadsfacebook/vjepa2-vitl-fpc64-256by default. - Closest streaming prior art: VL-JEPA, OmniStream, Recurrent Video MAE, CarelessWhisper.
License
MIT, see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file saccadic-0.1.0.tar.gz.
File metadata
- Download URL: saccadic-0.1.0.tar.gz
- Upload date:
- Size: 78.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5fcd45b30a8bce70ab21c787bbf60f4c567cf79168539cb2cc49a05feb091864
|
|
| MD5 |
0b94d35a9b844ae05f104368d77bc815
|
|
| BLAKE2b-256 |
2583f1a92852342482cd0bc0ff13e343f993272c1a77eb1921137a31d98a889e
|
Provenance
The following attestation bundles were made for saccadic-0.1.0.tar.gz:
Publisher:
release.yml on Khushiyant/saccade
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
saccadic-0.1.0.tar.gz -
Subject digest:
5fcd45b30a8bce70ab21c787bbf60f4c567cf79168539cb2cc49a05feb091864 - Sigstore transparency entry: 2020019376
- Sigstore integration time:
-
Permalink:
Khushiyant/saccade@b1308e961debafd3a084f1e0cbdea8769fa96c25 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Khushiyant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b1308e961debafd3a084f1e0cbdea8769fa96c25 -
Trigger Event:
push
-
Statement type:
File details
Details for the file saccadic-0.1.0-py3-none-any.whl.
File metadata
- Download URL: saccadic-0.1.0-py3-none-any.whl
- Upload date:
- Size: 78.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f771040fbd709ef26c4d69fe6744471f244d4b58b1cdab73ea900f2f5321ae97
|
|
| MD5 |
2d8b775335d604fcb1dfb743aa17d3f2
|
|
| BLAKE2b-256 |
9f955045220b2d03ff0ec9d2bf319a8ad8a89c081b40ad207575eee8625c508f
|
Provenance
The following attestation bundles were made for saccadic-0.1.0-py3-none-any.whl:
Publisher:
release.yml on Khushiyant/saccade
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
saccadic-0.1.0-py3-none-any.whl -
Subject digest:
f771040fbd709ef26c4d69fe6744471f244d4b58b1cdab73ea900f2f5321ae97 - Sigstore transparency entry: 2020019569
- Sigstore integration time:
-
Permalink:
Khushiyant/saccade@b1308e961debafd3a084f1e0cbdea8769fa96c25 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Khushiyant
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@b1308e961debafd3a084f1e0cbdea8769fa96c25 -
Trigger Event:
push
-
Statement type: