Zero-API-key tool to extract viral short clips from videos, ready for TikTok/Reels
Project description
ViralClip
Extract the most engaging clips from any video — zero API keys, runs 100% locally.
Install
pip install viralclip
Optional extras:
pip install "viralclip[captions]" # burn-in subtitles via local Whisper
pip install "viralclip[smartcrop]" # face-detect crop via opencv
pip install "viralclip[all]" # everything
System dependency: ffmpeg must be on your PATH.
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/Debian
Usage
# Most engaging 60s from a local file
viralclip clip video.mp4
# Download from YouTube and clip
viralclip clip https://youtube.com/watch?v=xxxxx
# Burn in captions (local Whisper, no API)
viralclip clip video.mp4 --captions
# Export 3 non-overlapping clips
viralclip clip video.mp4 --count 3
# Custom duration
viralclip clip video.mp4 --duration 45
# Output format
viralclip clip video.mp4 --format horizontal # 16:9 YouTube
viralclip clip video.mp4 --format square # 1:1 Instagram feed
viralclip clip video.mp4 --format portrait # 4:5 Instagram portrait
viralclip clip video.mp4 --format vertical # 9:16 TikTok/Reels (default)
# Smart crop — detect face, crop around subject
viralclip clip video.mp4 --smart-crop
# Preview each clip in system player after export
viralclip clip video.mp4 --preview
# See timestamps without exporting anything
viralclip clip video.mp4 --dry-run
# Custom output location and filename
viralclip clip video.mp4 --output-dir ./clips --output-name my-clip
# Suppress all output (CI/scripting)
viralclip clip video.mp4 --quiet
# Nudge window ±N seconds after dry-run preview
viralclip clip video.mp4 --dry-run # see timestamps
viralclip clip video.mp4 --offset 8 # shift forward 8s
viralclip clip video.mp4 --offset -5 # shift back 5s
Config file
Persist defaults in ~/.config/viralclip/config.toml so you don't repeat flags every run:
[defaults]
duration = 45
format = "vertical"
output_dir = "~/clips"
smart_crop = true
count = 3
Output formats
| Flag | Ratio | Resolution | Platform |
|---|---|---|---|
vertical (default) |
9:16 | 1080×1920 | TikTok, Reels, Shorts |
horizontal |
16:9 | 1920×1080 | YouTube, Twitter/X |
square |
1:1 | 1080×1080 | Instagram feed |
portrait |
4:5 | 1080×1350 | Instagram portrait |
How the algorithm works
1. YouTube heatmap (when available)
For YouTube URLs, yt-dlp fetches the Most Replayed heatmap — real viewer replay density from millions of views. Each timestamp gets a score 0–1 representing how often that moment was rewatched. When present this carries 65% of the final score.
2. Audio analysis (always runs — 35% with heatmap, 100% without)
Five features computed per second via librosa. Weights adapt automatically based on content type:
| Feature | Speech weight | Music weight | What it captures |
|---|---|---|---|
| RMS energy | 35% | 20% | Loudness / presence |
| Spectral flux | 15% | 35% | Rate of change — beat drops, cuts |
| Onset strength | 20% | 30% | Word density, musical onsets |
| Zero Crossing Rate | 30% | 15% | Consonants, high-freq activity |
| Crowd reactions | +15% bonus | +15% bonus | Applause / laughter bursts |
Content type is detected via spectral flatness (low = tonal/music, high = noisy/speech). Weights interpolate smoothly between speech and music profiles. Silent seconds (< 5% peak RMS) are penalised 10×.
3. Quality filtering
Before window selection, two passes over the video:
- Scene cut detection — ffmpeg finds hard cuts (scene score > 0.3); window starts snap to the nearest cut within ±3s so clips don't open mid-cut
- Black frame detection — windows with > 10% black frames are skipped entirely
4. Smoothing + peak selection
Scores are Gaussian-smoothed (σ=2.5s). Then:
- Find local score peaks (
scipy.signal.find_peaks, min distance = window/2) - Place window 40% before / 60% after the peak — buildup + payoff
- Score each window as
0.7 × mean + 0.3 × peak - Prefer window starts that follow a natural dip (breath before the moment)
- For
--count N, greedily pick non-overlapping peaks
5. Crop
Center-crop to target ratio by default. With --smart-crop, OpenCV Haar cascade samples 3 frames (at 25%, 50%, 75% through the clip), detects faces in each, and uses the median face center — more robust than single-frame sampling.
Feature comparison
| Feature | ViralClip | OpusClip | ViralCutter |
|---|---|---|---|
| No API key needed | ✅ | ❌ | ❌ |
| Works fully offline | ✅ | ❌ | ❌ |
| YouTube heatmap signal | ✅ | ✅ (cloud) | ❌ |
| Local captions (Whisper) | ✅ | Cloud | Cloud |
| Face-aware crop | ✅ | ✅ | ❌ |
| Multiple output formats | ✅ | ✅ | ❌ |
| YouTube download | ✅ | ✅ | ✅ |
| Multi-clip export | ✅ | ✅ | ✅ |
| Dry run / preview | ✅ | ❌ | ❌ |
| Free | ✅ | Freemium | Freemium |
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters