Smart video cut point detection for AI-generated talking head videos using multi-factor visual analysis and speech detection

These details have not been verified by PyPI

Project links

Project description

SceneFlow

Smart cut point detection for AI-generated talking head videos

SceneFlow automatically finds the optimal point to cut your AI-generated talking head videos using advanced speech detection and multi-factor visual analysis. No more awkward mid-speech cuts or unwanted motion at the end of your videos!

Features

Intelligent Speech Detection - Uses Silero VAD (Voice Activity Detection) to precisely identify when speech ends
Multi-Factor Visual Analysis - Ranks potential cut points based on:
- Eye openness (natural blink detection)
- Motion stability (minimal movement)
- Expression neutrality (calm facial expressions)
- Pose stability (steady head position)
- Visual sharpness (frame quality)
URL Support - Works directly with video URLs (YouTube, etc.) via yt-dlp
Flexible Output - Simple timestamp output, verbose mode, or detailed JSON analysis
Customizable Weights - Adjust the importance of each visual factor
Pure Ranking Algorithm - No arbitrary thresholds, everything is relative

Installation

pip install sceneflow

Requirements

Python 3.9 or higher
FFmpeg (for video processing)

Quick Start

Command Line

# Basic usage - outputs just the timestamp
sceneflow video.mp4
# Output: 5.23

# Verbose mode - see detailed analysis
sceneflow video.mp4 --verbose

# Save detailed JSON analysis
sceneflow video.mp4 --json ./output

# From a URL
sceneflow "https://www.youtube.com/watch?v=..." --verbose

# Custom weights
sceneflow video.mp4 --eye-weight 0.4 --motion-weight 0.3

Python API

Simple API (Recommended)

from sceneflow import get_cut_frame, get_ranked_cut_frames

# Get the single best cut point
best_time = get_cut_frame("video.mp4")
print(f"Cut at: {best_time:.2f}s")

# Get top 5 cut points
top_5 = get_ranked_cut_frames("video.mp4", n=5)
for i, time in enumerate(top_5, 1):
    print(f"{i}. {time:.2f}s")

# With custom configuration
from sceneflow import RankingConfig
config = RankingConfig(
    eye_openness_weight=0.40,
    motion_stability_weight=0.30
)
best_time = get_cut_frame("video.mp4", config=config, sample_rate=1)

Advanced API (More Control)

from sceneflow import CutPointRanker
from sceneflow.speech_detector import SpeechDetector
import cv2

# Detect when speech ends
detector = SpeechDetector()
speech_end_time, confidence = detector.get_speech_end_time(
    "video.mp4",
    return_confidence=True
)

# Get video duration
cap = cv2.VideoCapture("video.mp4")
fps = cap.get(cv2.CAP_PROP_FPS)
frame_count = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
duration = frame_count / fps
cap.release()

# Rank frames after speech ends
ranker = CutPointRanker()
ranked_frames = ranker.rank_frames(
    video_path="video.mp4",
    start_time=speech_end_time,
    end_time=duration,
    sample_rate=2
)

# Get the best cut point
best_cut = ranked_frames[0]
print(f"Best cut point: {best_cut.timestamp:.2f}s (score: {best_cut.score:.4f})")

How It Works

1. Speech Detection

SceneFlow uses Silero VAD, a deep learning model for voice activity detection. This provides highly accurate detection of when speech actually ends, not just when audio fades out.

2. Visual Analysis

After identifying when speech ends, SceneFlow analyzes each frame using MediaPipe FaceMesh (478 facial landmarks + 52 blendshapes) and ranks them based on:

Factor	Weight	Description
Eye Openness	30%	Prefers natural eye openness (not too wide, not blinking)
Motion Stability	25%	Minimal optical flow between frames
Expression Neutrality	20%	Calm, neutral facial expressions
Pose Stability	15%	Steady head position and orientation
Visual Sharpness	10%	Clear, sharp frame quality

3. Ranking System

The algorithm uses a multi-stage ranking process:

Extract raw features from all frames
Normalize metrics across the entire frame set
Apply weighted scoring based on configuration
Use temporal context windows to favor stable sequences
Return ranked list of cut point candidates

CLI Options

sceneflow SOURCE [OPTIONS]

Arguments:
  SOURCE                  Path to video file or URL

Options:
  --verbose              Show detailed analysis information
  --json-output PATH     Save detailed analysis to JSON file (directory path)
  --sample-rate INT      Process every Nth frame (default: 2)
  --save-frames          Save annotated frames with MediaPipe landmarks
  --save-video           Save cut video from start to best timestamp (requires ffmpeg)
  --top-n INT            Return top N ranked timestamps in sorted order (shows scores)

  --help                 Show this message and exit
  --version              Show version and exit

Advanced Usage

Custom Configuration

from sceneflow import get_cut_frame, RankingConfig

# Emphasize eye openness and motion stability
config = RankingConfig(
    eye_openness_weight=0.40,
    motion_stability_weight=0.30,
    expression_neutrality_weight=0.15,
    pose_stability_weight=0.10,
    visual_sharpness_weight=0.05,
    context_window_size=7,          # Larger = more temporal smoothing
    quality_gate_percentile=80.0,   # Stricter quality filtering
    local_stability_window=7        # Larger = favor longer stable sequences
)

best_time = get_cut_frame("video.mp4", config=config)

Save Outputs

from sceneflow import get_cut_frame

# Save annotated frames and cut video
best_time = get_cut_frame(
    "video.mp4",
    save_frames=True,  # Saves frames with MediaPipe landmarks
    save_video=True    # Saves cut video (requires ffmpeg)
)

# Outputs saved to:
# - output/<video_name>/: Annotated frames
# - output/<video_name>_cut.mp4: Cut video

Multiple Cut Points

from sceneflow import get_ranked_cut_frames

# Get top 10 cut points
top_10 = get_ranked_cut_frames("video.mp4", n=10, sample_rate=2)

for i, timestamp in enumerate(top_10, 1):
    print(f"{i}. {timestamp:.2f}s")

Example Output

Default Mode

$ sceneflow video.mp4
5.23

Verbose Mode

$ sceneflow video.mp4 --verbose

============================================================
SCENEFLOW - Smart Video Cut Point Detection
============================================================

Analyzing: video.mp4

[1/2] Detecting speech end time (model: small)...
      Speech ends at: 5.12s (confidence: 0.87)
      Video duration: 8.50s

[2/2] Analyzing visual features from 5.12s to 8.50s...

============================================================
RESULTS
============================================================

Best cut point: 5.23s
Frame: 157
Score: 0.8745

Top 3 candidates:
  1. 5.23s (frame 157, score: 0.8745)
  2. 5.45s (frame 164, score: 0.8621)
  3. 5.89s (frame 177, score: 0.8534)

  ... and 47 more candidates

Performance Tips

sample_rate: Use sample_rate=2 or sample_rate=3 to skip frames for faster processing
Whisper model: Default small model provides good balance of speed and accuracy
URL downloads: Videos are automatically cleaned up after processing to save disk space

Technical Details

Speech Detection: Silero VAD (Voice Activity Detection) for accurate speech/silence detection
Facial Analysis: MediaPipe FaceMesh (478 landmarks + 52 blendshapes)
Motion Analysis: Farneback optical flow for motion stability
Eye Detection: Eye Aspect Ratio (EAR) method for blink detection
Frame Quality: Laplacian variance for sharpness assessment
Ranking: Multi-factor scoring with temporal context windows

Use Cases

Content Creators: Automatically find natural cut points in AI-generated talking head videos
Video Editors: Quickly identify optimal endpoints for video clips
AI Video Tools: Integrate smart cut point detection into video generation pipelines
Research: Analyze facial features and speech patterns in videos

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details.

License

MIT License - see LICENSE for details

Examples

Check the examples/ directory for more usage examples:

basic_usage.py - Simple API usage
custom_config.py - Custom configuration examples
ranked_results.py - Getting multiple cut points
url_download.py - Working with video URLs

Changelog

See CHANGELOG.md for version history.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.4

Dec 9, 2025

0.2.3

Dec 4, 2025

0.2.2

Dec 2, 2025

0.2.1

Dec 2, 2025

0.2.0

Nov 28, 2025

This version

0.1.8

Nov 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sceneflow-0.1.8.tar.gz (24.9 kB view details)

Uploaded Nov 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sceneflow-0.1.8-py3-none-any.whl (31.3 kB view details)

Uploaded Nov 24, 2025 Python 3

File details

Details for the file sceneflow-0.1.8.tar.gz.

File metadata

Download URL: sceneflow-0.1.8.tar.gz
Upload date: Nov 24, 2025
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sceneflow-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`10caf0854c501860ccb38a775854ce962a872ae4f831883a9675faeb72826e69`
MD5	`2e9207764459264ff0c516415fd4f46e`
BLAKE2b-256	`85b8269f883b32eef5860ea4f0d79db4c2c44b9dbfa7bbf0854bff2ba58049d3`

See more details on using hashes here.

File details

Details for the file sceneflow-0.1.8-py3-none-any.whl.

File metadata

Download URL: sceneflow-0.1.8-py3-none-any.whl
Upload date: Nov 24, 2025
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for sceneflow-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81d7f2c1009261b5109f6435205a852dad245aec49ba923d365c2b52a8b06403`
MD5	`02e4128493ba89af4bffd1b9068f24e9`
BLAKE2b-256	`a263faf19c5d98f37c59db1e260a99ebcae3b44e5a5019715786ed61819a5e7a`

See more details on using hashes here.

sceneflow 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SceneFlow

Features

Installation

Requirements

Quick Start

Command Line

Python API

How It Works

1. Speech Detection

2. Visual Analysis

3. Ranking System

CLI Options

Advanced Usage

Custom Configuration

Save Outputs

Multiple Cut Points

Example Output

Default Mode

Verbose Mode

Performance Tips

Technical Details

Use Cases

Contributing

License

Examples

Links

Changelog

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes