Video summarization via the Signature Transform (de Curtò & de Zarzà, Electronics 2023)
Project description
sigvideo
Automatic video summarization via the Signature Transform.
Given any video file, sigvideo selects the most representative keyframes and assembles them into a condensed summary video — no annotations, no training data, no GPU required.
from sigvideo import summarize_video
summarize_video("lecture.mp4", "summary.mp4")
sigvideo lecture.mp4 summary.mp4
Based on the paper:
de Curtò, J.; de Zarzà, I.; Roig, G.; Calafate, C.T. Summarization of Videos with the Signature Transform. Electronics 2023, 12, 1735. https://doi.org/10.3390/electronics12071735
How it works
The Signature Transform is a rough equivalent of the Fourier Transform for paths: instead of frequency it captures order and area (iterated integrals). Applied to video frames treated as a temporal path, it provides a compact harmonic descriptor of each frame.
sigvideo evaluates n candidate uniform random samples and selects the one whose element-wise mean signature has the lowest standard deviation when compared against repeated random draws from the full video — meaning it best preserves the harmonic components of the original. This baseline is called RMSE(S̄, S̄_umin)|n in the paper and achieves 100% positive cases on the 28-video benchmark without any human annotations or training.
Installation
pip install sigvideo
Optional — text-conditioned VLM summarization (OWL-ViT via HuggingFace, requires PyTorch):
pip install sigvideo[vlm]
Dependencies
| Package | Purpose | License |
|---|---|---|
iisignature |
Signature Transform | MIT |
opencv-python-headless |
Frame I/O | Apache 2.0 |
numpy |
Numerical ops | BSD-3 |
sigvideo is MIT-licensed. All dependencies use permissive licenses — no copyleft, no additional obligations.
Quick start
Python
from sigvideo import summarize_video
# One call — video in, summary video out
out, frames, rmse_mean, rmse_std = summarize_video(
"input.mp4",
"summary.mp4",
)
print(f"{len(frames)} keyframes selected")
# Auto-detect best length, keep extracted frames
out, frames, _, _ = summarize_video(
"input.mp4",
"summary.mp4",
fps_extract = 1.0, # extract 1 frame/s
fps_output = 2.0, # each keyframe shown 0.5 s
keep_frames = "./frames",
)
# Paper-exact reproduction (sig_order=3, ~100x slower)
summarize_video("input.mp4", "summary.mp4", sig_order=3)
CLI
# Minimal
sigvideo input.mp4 summary.mp4
# With options
sigvideo input.mp4 summary.mp4 \
--length 20 \ # fixed keyframe count (default: auto)
--fps-extract 1 \ # extraction rate
--fps-output 2 \ # output playback rate
--sig-order 3 \ # paper-exact (default: 2, fast)
--no-timestamp # suppress frame badge overlay
# Score a summary against original frames
sigvideo score --frames-dir ./frames --summary-dir ./keyframes
# Text-conditioned summarization (requires sigvideo[vlm])
sigvideo vlm --frames-dir ./frames --subtitles transcript.txt
Signature order
The signature truncation order controls the speed/quality tradeoff:
sig_order |
Time/frame | Signature dim | Use case |
|---|---|---|---|
| 2 (default) | ~0.6 ms | 4,160 | Everyday use, long videos |
| 3 (paper) | ~70 ms | 266,304 | Paper-exact reproduction |
Both produce valid summaries. Order 2 is the practical default; order 3 reproduces the numbers in Tables 1–5 of the paper exactly.
Full API
summarize_video(input_video, output_video, **kwargs)
Primary function. Chains frame extraction → signature selection → H.264 video writing.
| Parameter | Default | Description |
|---|---|---|
summary_length |
None |
Number of keyframes. None → auto-detect. |
n_candidates |
10 |
Candidate summaries evaluated (paper default). |
n_comparisons |
10 |
Inner RMSE comparison budget (paper default). |
fps_extract |
1.0 |
Frame sampling rate for extraction. |
fps_output |
2.0 |
Playback fps of output video. |
preserve_timing |
False |
Hold each frame proportional to its source gap. |
add_timestamp |
True |
Overlay frame index badge. |
sig_order |
2 |
Signature order (2=fast, 3=paper exact). |
sig_size |
(64,64) |
Frame resize for signature computation. |
keep_frames |
None |
Save extracted frames to this directory. |
Returns (output_path, selected_frame_names, rmse_mean, rmse_std).
write_summary_video(paths, output_path, **kwargs)
Assemble a list of frame paths into an MP4 directly (H.264 via ffmpeg when available).
summarize(frames_dir, summary_length, **kwargs)
Frame-level: select keyframes from a directory of images, returns (frame_names, rmse_mean, rmse_std).
auto_length(frames_dir, length_range=None, **kwargs)
Sweep a range of lengths and return the one with lowest std(RMSE).
rmse_signature_score(summary_paths, video_paths, n_comparisons=10)
Score an existing summary: returns (values, mean, std) of RMSE(S̄, S̄*).
rmse_baseline(video_paths, summary_length, n_comparisons=10)
Compute the RMSE(S̄, S̄) baseline (two random samples vs each other).
extract_frames(video_path, output_dir, fps=1.0)
Extract frames from a video at a target fps, saves PNGs to output_dir.
VLM text-conditioned pipeline
When sigvideo[vlm] is installed, frames can also be selected by matching
video content against noun queries extracted from a subtitle transcript:
from sigvideo.vlm import summarize_vlm_from_subtitles
frames, queries = summarize_vlm_from_subtitles(
frames_dir = "./frames",
subtitles_path = "transcript.txt",
top_n_queries = 20,
score_threshold = 0.025,
)
Or from the CLI:
sigvideo vlm \
--frames-dir ./frames \
--subtitles transcript.txt \
--threshold 0.025 \
--output summary_vlm.txt
This uses OWL-ViT (Minderer et al., 2022) for zero-shot text-conditioned object detection, following §3 of the paper.
Metrics
The package exposes the three metrics defined in §2.1 of the paper:
| Metric | Meaning |
|---|---|
RMSE(S̄, S̄*) |
Error between summary spectrum and random uniform sample. Low std → good coverage. |
RMSE(S̄, S̄) |
Error between two random samples. Serves as confidence baseline. |
RMSE(S̄, S̄_umin)|n |
The best candidate among n random samples by minimum std. |
A summary passes when std(RMSE(S̄, S̄*)) ≤ std(RMSE(S̄, S̄)).
Citation
@article{sigvideodecurto2023,
title = {Summarization of Videos with the Signature Transform},
author = {de Curt{\`o}, J. and de Zarz{\`a}, I. and Roig, G. and Calafate, C.T.},
journal = {Electronics},
volume = {12},
number = {7},
pages = {1735},
year = {2023},
doi = {10.3390/electronics12071735}
}
License
MIT — Copyright (c) 2023 J. de Curtò, I. de Zarzà
See LICENSE for the full text.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sigvideo-0.3.0.tar.gz.
File metadata
- Download URL: sigvideo-0.3.0.tar.gz
- Upload date:
- Size: 23.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f8c1b941d2f956a2134820d44e9944581aa466056561390c829e3b02b16fc4a
|
|
| MD5 |
a24967d57fda2acd9dda6372208a78cd
|
|
| BLAKE2b-256 |
7156529363b6ef2b7ee33728e4f501d61167bb45b8a108cfd04972ab219e3ce8
|
File details
Details for the file sigvideo-0.3.0-py3-none-any.whl.
File metadata
- Download URL: sigvideo-0.3.0-py3-none-any.whl
- Upload date:
- Size: 20.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f392267f3a38d15af916c6319b25c06e35aaba3396769beef2efd8d32e66f8f1
|
|
| MD5 |
c5bffe4acdd4f188e42c5abe2d736d95
|
|
| BLAKE2b-256 |
49f688ba91cee052432e8a932b5434e8b5f917099536f05989101f34b49d7efd
|