3 projects
WhoSpoke
Multimodal speaker identification for video: selectable diarization, transcription, face recognition, voice matching, pixel mouth-gap active-speaker cues, and evidence fusion.
web-vest
Visual Element-based Saliency Toolkit for multimodal webpage saliency extraction and scoring.
flagsense
A simple package for running flag detection models on images