Skip to main content

GPU-accelerated video stabilization with COLMAP trajectory export

Project description

cuda-motion-flow

GPU-accelerated video stabilization with COLMAP trajectory export.

Every stage runs on the GPU — Shi-Tomasi corner detection, pyramidal Lucas-Kanade tracking, vectorised RANSAC, trajectory smoothing, and bilinear affine warping. The recovered camera motion exports directly to COLMAP for use in Gaussian Splatting, NeRF, and Structure-from-Motion pipelines.

PyPI PyPI Downloads Python License: MIT CUDA CuPy OpenCV Code style: black Checked with mypy


Gaussian convolution Kalman-RTS smoother L1 / Total-Variation
Fast, symmetric Best quality — optimal Preserves intentional pans

Each clip: original shaky footage (left) vs stabilized output (right) — view on GitHub if GIFs don't load


What it does

  • Full GPU pipeline — raw CUDA C++ kernels for the hot path; no CPU fallback in production
  • Three trajectory smoothers — Gaussian, Kalman-RTS (globally optimal), L1/TV (preserves pans)
  • COLMAP export — per-frame R, t, quaternion for direct input to Gaussian Splatting / SfM
  • Quality analysis — companion script with 5 metric categories (stability, smoothness, frequency, SSIM, PSNR)
  • Rich CLI — progress bars, per-stage timing, VRAM display

Pipeline

Input frames
    │
    ▼  Stage 1 — Corner detection
    │  Scharr gradient + Shi-Tomasi response
    │  Shared-memory tiled raw CUDA kernel; 22×22 compile-time tile; __ldg() L1 reads
    │
    ▼  Stage 2 — Feature tracking
    │  Pyramidal Lucas-Kanade — all N points processed in parallel
    │  Pyramid built with raw CUDA anti-aliased 2× Gaussian downsampling kernel
    │
    ▼  Stage 3 — Transform estimation
    │  GPU RANSAC — all 500 hypotheses scored simultaneously (n_iter × n_pts grid)
    │  Affine refinement over inliers: cupy.linalg.lstsq
    │
    ▼  Stage 4 — Trajectory smoothing
    │  gaussian   Gaussian convolution — fast, symmetric
    │  kalman     Rauch-Tung-Striebel optimal smoother — globally minimum-variance
    │  l1         Total-Variation / Chambolle-Pock ADMM — preserves intentional pans
    │
    ▼  Stage 5 — Frame warping
    │  Bilinear affine warp: 32×8 thread block, #pragma unroll channels, __ldg()
    │  Two non-blocking CUDA streams — overlaps H→D transfers with GPU compute
    │
    ▼  Stage 6 — Camera pose export  (optional)
       Homography → R, t via Malis-Vargas decomposition
       Quaternion via Shepperd's method
       COLMAP cameras.txt / images.txt  or  JSON

Installation

1. Check your CUDA version

nvcc --version

2. Install the matching CuPy build

pip install cupy-cuda13x   # CUDA 13.x
pip install cupy-cuda12x   # CUDA 12.x

3. Install the package

pip install cuda-motion-flow

Recommended — virtual environment

python -m venv .venv
.venv\Scripts\activate        # Windows
source .venv/bin/activate     # Linux / macOS
pip install cupy-cuda13x cuda-motion-flow

Quick start

CLI

# Default — Gaussian smoother
cuda-motion-flow shaky.mp4 stable.mp4

# Kalman-RTS — best quality on mixed motion
cuda-motion-flow input.mp4 output.mp4 --smoother kalman --smoothing 0.6

# L1 / Total-Variation — preserves intentional pans
cuda-motion-flow vlog.mp4 vlog_stable.mp4 --smoother l1 --smoothing 0.4

# Export COLMAP trajectory for Gaussian Splatting / SfM
cuda-motion-flow input.mp4 output.mp4 --export-trajectory ./colmap/

# GPU device info
cuda-motion-flow --device-info

Python API

from cuda_motion_flow import stabilize_video

# Basic
stabilize_video("shaky.mp4", "stable.mp4", smoothing_factor=0.4)

# Kalman-RTS with COLMAP export
stabilize_video(
    "shaky.mp4", "stable.mp4",
    smoother="kalman",
    smoothing_factor=0.6,
    export_trajectory="./colmap/",   # writes cameras.txt + images.txt
)

# JSON trajectory
stabilize_video(
    "shaky.mp4", "stable.mp4",
    export_trajectory="trajectory.json",
)

Trajectory smoothers

Smoother Algorithm When to use
gaussian Gaussian convolution Fast previews, short clips
kalman Rauch-Tung-Striebel optimal smoother General use — best quality on mixed motion
l1 Total-Variation (Chambolle-Pock ADMM) Content with intentional pans to preserve

Kalman-RTS is the globally optimal (minimum-variance) batch smoother for a constant-velocity linear Gaussian trajectory model. It adapts automatically — the effective smoothing window adjusts to local motion magnitude. smoothing_strength controls the process-to-measurement noise ratio Q/R.

L1 / TV produces piecewise-constant trajectories. High-frequency jitter is removed; deliberate camera moves are left intact. Solved via Chambolle-Pock primal-dual ADMM.


COLMAP trajectory export

cuda-motion-flow input.mp4 stable.mp4 --export-trajectory ./colmap/

Output structure:

colmap/
  cameras.txt    # PINHOLE model — f = max(W, H), cx = W/2, cy = H/2
  images.txt     # Per-frame qvec (Hamilton) + tvec in COLMAP convention
  points3D.txt   # Empty placeholder

JSON format (.json suffix):

{
  "intrinsics": { "fx": 1280.0, "fy": 1280.0, "cx": 640.0, "cy": 360.0 },
  "frames": [
    {
      "id": 0,
      "R": [[1,0,0],[0,1,0],[0,0,1]],
      "t": [0.0, 0.0, 0.0],
      "qvec": [1.0, 0.0, 0.0, 0.0],
      "camera_center": [0.0, 0.0, 0.0]
    }
  ]
}

Direct geometry API:

from cuda_motion_flow.geometry import estimate_intrinsics, decompose_homography, build_trajectory

K    = estimate_intrinsics(width=1280, height=720)
traj = build_trajectory(homographies, K)

traj.export_colmap("./colmap/")
traj.export_json("trajectory.json")

Quality analysis

Compare original vs stabilized outputs across five metric categories:

python compare_videos.py test.mp4 out_gaussian.mp4 out_kalman.mp4 out_l1.mp4

GPU-accelerated (uses the same LK pipeline as the stabilizer). Falls back to CPU Farneback automatically if CUDA is unavailable. Force CPU with --cpu.

Category Metrics
Stability Mean / std / P95 / max motion, stability score 1/(1+σ)
Smoothness Velocity std `
Frequency High/low-freq power ratio, spectral centroid (fps/4 threshold)
Visual Temporal SSIM, Laplacian sharpness
Fidelity SSIM vs original, PSNR vs original

Raw CUDA kernels

All performance-critical operations are raw CUDA C++ kernels compiled at runtime via cupy.RawKernel. No Python dispatch overhead in the hot path.

Kernel Configuration
affine_warp_bilinear_u8 32×8 block · __ldg() L1 reads · #pragma unroll
gaussian_downsample_f32 16×16 tile · 36×36 shared-memory halo · separable 5-tap
scharr_gradient_f32 18×18 shared-memory tile · Gx and Gy in one pass
shi_tomasi_response_f32 22×22 compile-time tile · min-eigenvalue response

All kernels accept an optional stream argument. Frame warping uses two non-blocking CUDA streams to pipeline H→D transfers with compute.


CLI reference

Usage: cuda-motion-flow [OPTIONS] INPUT_VIDEO OUTPUT_VIDEO

Smoothing:
  --smoother [gaussian|kalman|l1]   Algorithm            [default: gaussian]
  --smoothing FLOAT RANGE           Strength 0.0–1.0     [default: 0.3]

Output:
  --no-crop                         Disable auto-crop of black borders
  --no-resize                       Keep cropped resolution
  --export-trajectory PATH          .json or COLMAP directory

Diagnostics:
  -v, --verbose                     Per-stage timing
  --device-info                     Print GPU info and exit
  --help                            Show this message and exit

Python API reference

# Stabilization
stabilize_video(input_path, output_path,
    smoothing_factor=0.3, smoother="gaussian",
    verbose=False, auto_crop=True, preserve_resolution=True,
    export_trajectory=None)

# Device
check_cuda_available() -> bool
get_device_info()      -> dict        # device_name, compute_capability, memory

# Pipeline primitives
compute_optical_flow_gpu(prev_gray, curr_gray)           -> (prev_pts, curr_pts)
estimate_transform_from_flow_gpu(prev_pts, curr_pts)     -> (H, dx, dy, da)
detect_corners_gpu(img, max_corners, quality, min_dist)  -> corners
track_points_gpu(prev, curr, pts, window_size, max_level) -> (tracked, status)
ransac_affine_gpu(src, dst, n_iter, threshold)           -> (M_2x3, inliers)

# Trajectory
smooth_trajectory(dx, dy, da, method, smoothing_strength) -> (N, 3, 3)

# Geometry
estimate_intrinsics(width, height)   -> CameraIntrinsics
decompose_homography(H, K)           -> List[(R, t, n)]
build_trajectory(homographies, K)    -> CameraTrajectory

Requirements

  • Python 3.9+
  • NVIDIA GPU — CUDA 12.x or 13.x
  • cupy-cuda12x or cupy-cuda13x — install separately, match nvcc --version
  • opencv-python >= 4.8
  • numpy >= 1.22
  • rich >= 13.0
  • rich-click >= 1.7

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuda_motion_flow-1.0.1.tar.gz (41.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cuda_motion_flow-1.0.1-py3-none-any.whl (37.6 kB view details)

Uploaded Python 3

File details

Details for the file cuda_motion_flow-1.0.1.tar.gz.

File metadata

  • Download URL: cuda_motion_flow-1.0.1.tar.gz
  • Upload date:
  • Size: 41.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.9

File hashes

Hashes for cuda_motion_flow-1.0.1.tar.gz
Algorithm Hash digest
SHA256 4a284f0b3f0d702fafb81b40e4d3651928b9b6aa0a6b1587bcc7a2297bc37c91
MD5 824cb392ad2a661c4f2368d66b42625e
BLAKE2b-256 795737691428d6f7e8f47b2c9312f79ce9f2a050e711012603e07dda6e523c95

See more details on using hashes here.

File details

Details for the file cuda_motion_flow-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cuda_motion_flow-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b9709bb63c99aba23b3df831b68c17d9218e8a770cd39bf2e6b74a8b17541745
MD5 4d33d834ac0a2f880fc13cf6533ce989
BLAKE2b-256 2224aa76afd369e618c9df223bab50ab595fe78083ae94bea598dd9a7ac62d1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page