GPU-accelerated video stabilization with COLMAP trajectory export
Project description
cuda-motion-flow
GPU-accelerated video stabilization with COLMAP trajectory export.
Every stage runs on the GPU — Shi-Tomasi corner detection, pyramidal Lucas-Kanade tracking, vectorised RANSAC, trajectory smoothing, and bilinear affine warping. The recovered camera motion exports directly to COLMAP for use in Gaussian Splatting, NeRF, and Structure-from-Motion pipelines.
| Gaussian convolution | Kalman-RTS smoother | L1 / Total-Variation |
|---|---|---|
| Fast, symmetric | Best quality — optimal | Preserves intentional pans |
Each clip: original shaky footage (left) vs stabilized output (right) — view on GitHub if GIFs don't load
What it does
- Full GPU pipeline — raw CUDA C++ kernels for the hot path; no CPU fallback in production
- Three trajectory smoothers — Gaussian, Kalman-RTS (globally optimal), L1/TV (preserves pans)
- COLMAP export — per-frame R, t, quaternion for direct input to Gaussian Splatting / SfM
- Quality analysis — companion script with 5 metric categories (stability, smoothness, frequency, SSIM, PSNR)
- Rich CLI — progress bars, per-stage timing, VRAM display
Pipeline
Input frames
│
▼ Stage 1 — Corner detection
│ Scharr gradient + Shi-Tomasi response
│ Shared-memory tiled raw CUDA kernel; 22×22 compile-time tile; __ldg() L1 reads
│
▼ Stage 2 — Feature tracking
│ Pyramidal Lucas-Kanade — all N points processed in parallel
│ Pyramid built with raw CUDA anti-aliased 2× Gaussian downsampling kernel
│
▼ Stage 3 — Transform estimation
│ GPU RANSAC — all 500 hypotheses scored simultaneously (n_iter × n_pts grid)
│ Affine refinement over inliers: cupy.linalg.lstsq
│
▼ Stage 4 — Trajectory smoothing
│ gaussian Gaussian convolution — fast, symmetric
│ kalman Rauch-Tung-Striebel optimal smoother — globally minimum-variance
│ l1 Total-Variation / Chambolle-Pock ADMM — preserves intentional pans
│
▼ Stage 5 — Frame warping
│ Bilinear affine warp: 32×8 thread block, #pragma unroll channels, __ldg()
│ Two non-blocking CUDA streams — overlaps H→D transfers with GPU compute
│
▼ Stage 6 — Camera pose export (optional)
Homography → R, t via Malis-Vargas decomposition
Quaternion via Shepperd's method
COLMAP cameras.txt / images.txt or JSON
Installation
1. Check your CUDA version
nvcc --version
2. Install the matching CuPy build
pip install cupy-cuda13x # CUDA 13.x
pip install cupy-cuda12x # CUDA 12.x
3. Install the package
pip install cuda-motion-flow
Recommended — virtual environment
python -m venv .venv
.venv\Scripts\activate # Windows
source .venv/bin/activate # Linux / macOS
pip install cupy-cuda13x cuda-motion-flow
Quick start
CLI
# Default — Gaussian smoother
cuda-motion-flow shaky.mp4 stable.mp4
# Kalman-RTS — best quality on mixed motion
cuda-motion-flow input.mp4 output.mp4 --smoother kalman --smoothing 0.6
# L1 / Total-Variation — preserves intentional pans
cuda-motion-flow vlog.mp4 vlog_stable.mp4 --smoother l1 --smoothing 0.4
# Export COLMAP trajectory for Gaussian Splatting / SfM
cuda-motion-flow input.mp4 output.mp4 --export-trajectory ./colmap/
# GPU device info
cuda-motion-flow --device-info
Python API
from cuda_motion_flow import stabilize_video
# Basic
stabilize_video("shaky.mp4", "stable.mp4", smoothing_factor=0.4)
# Kalman-RTS with COLMAP export
stabilize_video(
"shaky.mp4", "stable.mp4",
smoother="kalman",
smoothing_factor=0.6,
export_trajectory="./colmap/", # writes cameras.txt + images.txt
)
# JSON trajectory
stabilize_video(
"shaky.mp4", "stable.mp4",
export_trajectory="trajectory.json",
)
Trajectory smoothers
| Smoother | Algorithm | When to use |
|---|---|---|
gaussian |
Gaussian convolution | Fast previews, short clips |
kalman |
Rauch-Tung-Striebel optimal smoother | General use — best quality on mixed motion |
l1 |
Total-Variation (Chambolle-Pock ADMM) | Content with intentional pans to preserve |
Kalman-RTS is the globally optimal (minimum-variance) batch smoother for a
constant-velocity linear Gaussian trajectory model. It adapts automatically —
the effective smoothing window adjusts to local motion magnitude.
smoothing_strength controls the process-to-measurement noise ratio Q/R.
L1 / TV produces piecewise-constant trajectories. High-frequency jitter is removed; deliberate camera moves are left intact. Solved via Chambolle-Pock primal-dual ADMM.
COLMAP trajectory export
cuda-motion-flow input.mp4 stable.mp4 --export-trajectory ./colmap/
Output structure:
colmap/
cameras.txt # PINHOLE model — f = max(W, H), cx = W/2, cy = H/2
images.txt # Per-frame qvec (Hamilton) + tvec in COLMAP convention
points3D.txt # Empty placeholder
JSON format (.json suffix):
{
"intrinsics": { "fx": 1280.0, "fy": 1280.0, "cx": 640.0, "cy": 360.0 },
"frames": [
{
"id": 0,
"R": [[1,0,0],[0,1,0],[0,0,1]],
"t": [0.0, 0.0, 0.0],
"qvec": [1.0, 0.0, 0.0, 0.0],
"camera_center": [0.0, 0.0, 0.0]
}
]
}
Direct geometry API:
from cuda_motion_flow.geometry import estimate_intrinsics, decompose_homography, build_trajectory
K = estimate_intrinsics(width=1280, height=720)
traj = build_trajectory(homographies, K)
traj.export_colmap("./colmap/")
traj.export_json("trajectory.json")
Quality analysis
Compare original vs stabilized outputs across five metric categories:
python compare_videos.py test.mp4 out_gaussian.mp4 out_kalman.mp4 out_l1.mp4
GPU-accelerated (uses the same LK pipeline as the stabilizer). Falls back to
CPU Farneback automatically if CUDA is unavailable. Force CPU with --cpu.
| Category | Metrics |
|---|---|
| Stability | Mean / std / P95 / max motion, stability score 1/(1+σ) |
| Smoothness | Velocity std ` |
| Frequency | High/low-freq power ratio, spectral centroid (fps/4 threshold) |
| Visual | Temporal SSIM, Laplacian sharpness |
| Fidelity | SSIM vs original, PSNR vs original |
Raw CUDA kernels
All performance-critical operations are raw CUDA C++ kernels compiled at
runtime via cupy.RawKernel. No Python dispatch overhead in the hot path.
| Kernel | Configuration |
|---|---|
affine_warp_bilinear_u8 |
32×8 block · __ldg() L1 reads · #pragma unroll 3× |
gaussian_downsample_f32 |
16×16 tile · 36×36 shared-memory halo · separable 5-tap |
scharr_gradient_f32 |
18×18 shared-memory tile · Gx and Gy in one pass |
shi_tomasi_response_f32 |
22×22 compile-time tile · min-eigenvalue response |
All kernels accept an optional stream argument. Frame warping uses two
non-blocking CUDA streams to pipeline H→D transfers with compute.
CLI reference
Usage: cuda-motion-flow [OPTIONS] INPUT_VIDEO OUTPUT_VIDEO
Smoothing:
--smoother [gaussian|kalman|l1] Algorithm [default: gaussian]
--smoothing FLOAT RANGE Strength 0.0–1.0 [default: 0.3]
Output:
--no-crop Disable auto-crop of black borders
--no-resize Keep cropped resolution
--export-trajectory PATH .json or COLMAP directory
Diagnostics:
-v, --verbose Per-stage timing
--device-info Print GPU info and exit
--help Show this message and exit
Python API reference
# Stabilization
stabilize_video(input_path, output_path,
smoothing_factor=0.3, smoother="gaussian",
verbose=False, auto_crop=True, preserve_resolution=True,
export_trajectory=None)
# Device
check_cuda_available() -> bool
get_device_info() -> dict # device_name, compute_capability, memory
# Pipeline primitives
compute_optical_flow_gpu(prev_gray, curr_gray) -> (prev_pts, curr_pts)
estimate_transform_from_flow_gpu(prev_pts, curr_pts) -> (H, dx, dy, da)
detect_corners_gpu(img, max_corners, quality, min_dist) -> corners
track_points_gpu(prev, curr, pts, window_size, max_level) -> (tracked, status)
ransac_affine_gpu(src, dst, n_iter, threshold) -> (M_2x3, inliers)
# Trajectory
smooth_trajectory(dx, dy, da, method, smoothing_strength) -> (N, 3, 3)
# Geometry
estimate_intrinsics(width, height) -> CameraIntrinsics
decompose_homography(H, K) -> List[(R, t, n)]
build_trajectory(homographies, K) -> CameraTrajectory
Requirements
- Python 3.9+
- NVIDIA GPU — CUDA 12.x or 13.x
cupy-cuda12xorcupy-cuda13x— install separately, matchnvcc --versionopencv-python >= 4.8numpy >= 1.22rich >= 13.0rich-click >= 1.7
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cuda_motion_flow-1.0.1.tar.gz.
File metadata
- Download URL: cuda_motion_flow-1.0.1.tar.gz
- Upload date:
- Size: 41.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a284f0b3f0d702fafb81b40e4d3651928b9b6aa0a6b1587bcc7a2297bc37c91
|
|
| MD5 |
824cb392ad2a661c4f2368d66b42625e
|
|
| BLAKE2b-256 |
795737691428d6f7e8f47b2c9312f79ce9f2a050e711012603e07dda6e523c95
|
File details
Details for the file cuda_motion_flow-1.0.1-py3-none-any.whl.
File metadata
- Download URL: cuda_motion_flow-1.0.1-py3-none-any.whl
- Upload date:
- Size: 37.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9709bb63c99aba23b3df831b68c17d9218e8a770cd39bf2e6b74a8b17541745
|
|
| MD5 |
4d33d834ac0a2f880fc13cf6533ce989
|
|
| BLAKE2b-256 |
2224aa76afd369e618c9df223bab50ab595fe78083ae94bea598dd9a7ac62d1a
|