Video summarization via the Signature Transform (de Curtò & de Zarzà, Electronics 2023)

These details have not been verified by PyPI

Project links

Project description

sigvideo

Automatic video summarization via the Signature Transform.

Given any video file, sigvideo selects the most representative keyframes and assembles them into a condensed summary video — no annotations, no training data, no GPU required.

from sigvideo import summarize_video
summarize_video("lecture.mp4", "summary.mp4")

sigvideo lecture.mp4 summary.mp4

Based on the paper:

de Curtò, J.; de Zarzà, I.; Roig, G.; Calafate, C.T. Summarization of Videos with the Signature Transform. Electronics 2023, 12, 1735. https://doi.org/10.3390/electronics12071735

How it works

The Signature Transform is a rough equivalent of the Fourier Transform for paths: instead of frequency it captures order and area (iterated integrals). Applied to video frames treated as a temporal path, it provides a compact harmonic descriptor of each frame.

sigvideo evaluates n candidate uniform random samples and selects the one whose element-wise mean signature has the lowest standard deviation when compared against repeated random draws from the full video — meaning it best preserves the harmonic components of the original. This baseline is called RMSE(S̄, S̄_umin)|n in the paper and achieves 100% positive cases on the 28-video benchmark without any human annotations or training.

Installation

pip install sigvideo

Optional — text-conditioned VLM summarization (OWL-ViT via HuggingFace, requires PyTorch):

pip install sigvideo[vlm]

Dependencies

Package	Purpose	License
`iisignature`	Signature Transform	MIT
`opencv-python-headless`	Frame I/O	Apache 2.0
`numpy`	Numerical ops	BSD-3

sigvideo is MIT-licensed. All dependencies use permissive licenses — no copyleft, no additional obligations.

Quick start

Python

from sigvideo import summarize_video

# One call — video in, summary video out
out, frames, rmse_mean, rmse_std = summarize_video(
    "input.mp4",
    "summary.mp4",
)
print(f"{len(frames)} keyframes selected")

# Auto-detect best length, keep extracted frames
out, frames, _, _ = summarize_video(
    "input.mp4",
    "summary.mp4",
    fps_extract = 1.0,    # extract 1 frame/s
    fps_output  = 2.0,    # each keyframe shown 0.5 s
    keep_frames = "./frames",
)

# Paper-exact reproduction (sig_order=3, ~100x slower)
summarize_video("input.mp4", "summary.mp4", sig_order=3)

CLI

# Minimal
sigvideo input.mp4 summary.mp4

# With options
sigvideo input.mp4 summary.mp4 \
    --length 20        \   # fixed keyframe count (default: auto)
    --fps-extract 1    \   # extraction rate
    --fps-output 2     \   # output playback rate
    --sig-order 3      \   # paper-exact (default: 2, fast)
    --no-timestamp         # suppress frame badge overlay

# Score a summary against original frames
sigvideo score --frames-dir ./frames --summary-dir ./keyframes

# Text-conditioned summarization (requires sigvideo[vlm])
sigvideo vlm --frames-dir ./frames --subtitles transcript.txt

Signature order

The signature truncation order controls the speed/quality tradeoff:

`sig_order`	Time/frame	Signature dim	Use case
2 (default)	~0.6 ms	4,160	Everyday use, long videos
3 (paper)	~70 ms	266,304	Paper-exact reproduction

Both produce valid summaries. Order 2 is the practical default; order 3 reproduces the numbers in Tables 1–5 of the paper exactly.

Full API

`summarize_video(input_video, output_video, **kwargs)`

Primary function. Chains frame extraction → signature selection → H.264 video writing.

Parameter	Default	Description
`summary_length`	`None`	Number of keyframes. `None` → auto-detect.
`n_candidates`	`10`	Candidate summaries evaluated (paper default).
`n_comparisons`	`10`	Inner RMSE comparison budget (paper default).
`fps_extract`	`1.0`	Frame sampling rate for extraction.
`fps_output`	`2.0`	Playback fps of output video.
`preserve_timing`	`False`	Hold each frame proportional to its source gap.
`add_timestamp`	`True`	Overlay frame index badge.
`sig_order`	`2`	Signature order (2=fast, 3=paper exact).
`sig_size`	`(64,64)`	Frame resize for signature computation.
`keep_frames`	`None`	Save extracted frames to this directory.

Returns (output_path, selected_frame_names, rmse_mean, rmse_std).

`write_summary_video(paths, output_path, **kwargs)`

Assemble a list of frame paths into an MP4 directly (H.264 via ffmpeg when available).

`summarize(frames_dir, summary_length, **kwargs)`

Frame-level: select keyframes from a directory of images, returns (frame_names, rmse_mean, rmse_std).

`auto_length(frames_dir, length_range=None, **kwargs)`

Sweep a range of lengths and return the one with lowest std(RMSE).

`rmse_signature_score(summary_paths, video_paths, n_comparisons=10)`

Score an existing summary: returns (values, mean, std) of RMSE(S̄, S̄*).

`rmse_baseline(video_paths, summary_length, n_comparisons=10)`

Compute the RMSE(S̄, S̄) baseline (two random samples vs each other).

`extract_frames(video_path, output_dir, fps=1.0)`

Extract frames from a video at a target fps, saves PNGs to output_dir.

VLM text-conditioned pipeline

When sigvideo[vlm] is installed, frames can also be selected by matching video content against noun queries extracted from a subtitle transcript:

from sigvideo.vlm import summarize_vlm_from_subtitles

frames, queries = summarize_vlm_from_subtitles(
    frames_dir   = "./frames",
    subtitles_path = "transcript.txt",
    top_n_queries  = 20,
    score_threshold = 0.025,
)

Or from the CLI:

sigvideo vlm \
    --frames-dir ./frames \
    --subtitles  transcript.txt \
    --threshold  0.025 \
    --output     summary_vlm.txt

This uses OWL-ViT (Minderer et al., 2022) for zero-shot text-conditioned object detection, following §3 of the paper.

Metrics

The package exposes the three metrics defined in §2.1 of the paper:

Metric	Meaning
`RMSE(S̄, S̄*)`	Error between summary spectrum and random uniform sample. Low std → good coverage.
`RMSE(S̄, S̄)`	Error between two random samples. Serves as confidence baseline.
`RMSE(S̄, S̄_umin)\|n`	The best candidate among n random samples by minimum std.

A summary passes when std(RMSE(S̄, S̄*)) ≤ std(RMSE(S̄, S̄)).

Citation

@article{sigvideodecurto2023,
  title   = {Summarization of Videos with the Signature Transform},
  author  = {de Curt{\`o}, J. and de Zarz{\`a}, I. and Roig, G. and Calafate, C.T.},
  journal = {Electronics},
  volume  = {12},
  number  = {7},
  pages   = {1735},
  year    = {2023},
  doi     = {10.3390/electronics12071735}
}

License

See LICENSE for the full text.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.1

Apr 6, 2026

This version

0.3.0

Apr 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sigvideo-0.3.0.tar.gz (23.3 kB view details)

Uploaded Apr 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sigvideo-0.3.0-py3-none-any.whl (20.8 kB view details)

Uploaded Apr 6, 2026 Python 3

File details

Details for the file sigvideo-0.3.0.tar.gz.

File metadata

Download URL: sigvideo-0.3.0.tar.gz
Upload date: Apr 6, 2026
Size: 23.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for sigvideo-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`3f8c1b941d2f956a2134820d44e9944581aa466056561390c829e3b02b16fc4a`
MD5	`a24967d57fda2acd9dda6372208a78cd`
BLAKE2b-256	`7156529363b6ef2b7ee33728e4f501d61167bb45b8a108cfd04972ab219e3ce8`

See more details on using hashes here.

File details

Details for the file sigvideo-0.3.0-py3-none-any.whl.

File metadata

Download URL: sigvideo-0.3.0-py3-none-any.whl
Upload date: Apr 6, 2026
Size: 20.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for sigvideo-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f392267f3a38d15af916c6319b25c06e35aaba3396769beef2efd8d32e66f8f1`
MD5	`c5bffe4acdd4f188e42c5abe2d736d95`
BLAKE2b-256	`49f688ba91cee052432e8a932b5434e8b5f917099536f05989101f34b49d7efd`

See more details on using hashes here.

sigvideo 0.3.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sigvideo

How it works

Installation

Dependencies

Quick start

Python

CLI

Signature order

Full API

summarize_video(input_video, output_video, **kwargs)

write_summary_video(paths, output_path, **kwargs)

summarize(frames_dir, summary_length, **kwargs)

auto_length(frames_dir, length_range=None, **kwargs)

rmse_signature_score(summary_paths, video_paths, n_comparisons=10)

rmse_baseline(video_paths, summary_length, n_comparisons=10)

extract_frames(video_path, output_dir, fps=1.0)

VLM text-conditioned pipeline

Metrics

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`summarize_video(input_video, output_video, **kwargs)`

`write_summary_video(paths, output_path, **kwargs)`

`summarize(frames_dir, summary_length, **kwargs)`

`auto_length(frames_dir, length_range=None, **kwargs)`

`rmse_signature_score(summary_paths, video_paths, n_comparisons=10)`

`rmse_baseline(video_paths, summary_length, n_comparisons=10)`

`extract_frames(video_path, output_dir, fps=1.0)`