Skip to main content

Cued Speech Processing Tools - Decode and Generate cued speech videos

Project description

Cued Speech Processing Tools

Python package for decoding and generating cued speech videos with MediaPipe and deep learning.

Features

  • Decoder: Convert cued speech videos to text with subtitles using neural networks and language models
  • Generator: Create cued speech videos from text with automatic hand gesture overlay
  • Automatic Data Management: Downloads required models and data automatically

Installation

Prerequisites

  • Python 3.11.*
  • Pixi (for Montreal Forced Aligner)

Setup Steps

  1. Install Pixi
# macOS/Linux
curl -fsSL https://pixi.sh/install.sh | bash

# Windows PowerShell
irm https://pixi.sh/install.ps1 | iex
  1. Create Pixi environment
mkdir cued-speech-env && cd cued-speech-env
pixi init
pixi add "python==3.11"
pixi add montreal-forced-aligner=3.3.4
  1. Install package
pixi run python -m pip install cued-speech
  1. Download data and setup MFA models
pixi shell
cued-speech download-data
pixi run mfa models save acoustic download/french_mfa.zip --overwrite
pixi run mfa models save dictionary download/french_mfa.dict --overwrite
  1. Verify installation
cued-speech --help

Quick Start

Decode Video (Cued Speech → Text)

# Basic usage with default parameters, we use the provided test video
cued-speech decode

# Custom video
cued-speech decode --video_path /path/to/video.mp4

Generate Video (Text → Cued Speech)

# Text extracted automatically from video audio
cued-speech generate input_video.mp4

# Skip Whisper 
cued-speech generate video.mp4 --skip-whisper --text "Votre texte ici"

Command Line Options

Decoder

Core Options:

  • --video_path PATH - Input video (default: download/test_decode.mp4)
  • --output_path PATH - Output video (default: output/decoder/decoded_video.mp4)
  • --right_speaker [True|False] - Speaker handedness (default: True)
  • --auto_download [True|False] - Auto-download data (default: True)

Model Paths (optional):

  • --model_path PATH - TFLite CTC model (default: download/cuedspeech_model_fixed_temporal.tflite)
  • --vocab_path PATH - Phoneme vocabulary
  • --face_tflite PATH - Face landmark model (default: download/face_landmarker.task)
  • --hand_tflite PATH - Hand landmark model (default: download/hand_landmarker.task)
  • --pose_tflite PATH - Pose landmark model (default: download/pose_landmarker_full.task)

Generator

Options:

  • VIDEO_PATH (required) - Input video file
  • --text TEXT - Manual text input (optional, otherwise extracted from audio)
  • --output_path PATH - Output video (default: output/generator/generated_cued_speech.mp4)
  • --language LANG - Language (default: french)
  • --skip-whisper - Skip Whisper transcription (requires --text)
  • --easing TYPE - Animation easing: linear, ease_in_out_cubic, ease_out_elastic, ease_in_out_back
  • --morphing/--no-morphing - Hand shape morphing (default: enabled)
  • --transparency/--no-transparency - Transparency effects (default: enabled)
  • --curving/--no-curving - Curved trajectories (default: enabled)

Python API

Decoder

from cued_speech import decode_video

decode_video(
    video_path="input.mp4",
    right_speaker=True,
    output_path="output/decoder/"
)

Generator

from cued_speech import generate_cue
import whisper

# Automatic text extraction
model = whisper.load_model("medium", download_root="download")
result_path = generate_cue(
    text=None,  # Extracted from video
    video_path="video.mp4",
    output_path="output/generator/",
    config={
        "model": model,  # Optional preloaded Whisper model
        "language": "french",
        "easing_function": "ease_in_out_cubic",
        "enable_morphing": True,
        "enable_transparency": True,
        "enable_curving": True,
    }
)

# With manual text
result_path = generate_cue(
    text="Bonjour tout le monde",
    video_path="video.mp4",
    output_path="output/generator/",
    config={"skip_whisper": True}
)

Data Management

# Download all required data
cued-speech download-data

# List available data
cued-speech list-data

# Clean up data
cued-speech cleanup-data --confirm

Downloaded Files

Data is stored in ./download/:

Decoder:

  • cuedspeech_model_fixed_temporal.tflite - TFLite CTC model (100-frame fixed temporal window)
  • phonelist.csv, lexicon.txt - Vocabularies
  • kenlm_fr.bin, kenlm_ipa.binary - Language models
  • homophones_dico.jsonl - Homophone dictionary
  • face_landmarker.task - Face landmarks (478 points, 3.6 MB, float16)
  • hand_landmarker.task - Hand landmarks (21 points/hand, 7.5 MB, float16)
  • pose_landmarker_full.task - Pose landmarks (33 points, 9.0 MB, float16, FULL complexity)

Generator:

  • rotated_images/ - Hand shape images
  • french_mfa.dict, french_mfa.zip - MFA models

Test Files:

  • test_decode.mp4, test_generate.mp4

Architecture

Decoder

  • MediaPipe Tasks API: Latest float16 models for landmark detection (.task files)
  • TFLite CTC Model: Three-stream fusion encoder (hand shape, position, lips) with 100-frame fixed temporal window
  • CTC Decoder: Phoneme recognition with KenLM beam search
  • Language Model: KenLM for French sentence correction
  • Real-time Processing: Overlap-save windowing for streaming inference

Generator

  • Whisper: Speech-to-text transcription
  • MFA: Montreal Forced Alignment for phoneme timing
  • Dynamic Scaling: Hand size automatically adapts to face width
  • Hand Rendering: MediaPipe-based hand landmark detection for accurate positioning

Notes

  • Models designed for 30 FPS videos
  • Hand size automatically scales based on detected face width
  • Decoder uses MediaPipe Tasks API (.task files) for landmark detection
  • CTC model uses TFLite with 100-frame fixed temporal window for optimal performance

License

MIT License - see LICENSE file

Support

Contact: boubasow.pro@gmail.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cued_speech-0.4.2.tar.gz (166.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cued_speech-0.4.2-py3-none-any.whl (155.9 kB view details)

Uploaded Python 3

File details

Details for the file cued_speech-0.4.2.tar.gz.

File metadata

  • Download URL: cued_speech-0.4.2.tar.gz
  • Upload date:
  • Size: 166.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cued_speech-0.4.2.tar.gz
Algorithm Hash digest
SHA256 7120e50920a0fbc8b85e56d7eb50c982b6ee24e86336d051aaf80bdbb2f9b0c3
MD5 5a6b9952585d06c4bcf4b97c05146909
BLAKE2b-256 20c4430751b324e8432b75ee3479b477a973be79c10a4e0c6a8e4c11d46e0d8d

See more details on using hashes here.

File details

Details for the file cued_speech-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: cued_speech-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 155.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cued_speech-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 1c2e4851fa07fdc7019d1e98a26995c2005ea01e61cabe4d1bdc6b276a27b6d5
MD5 decee2d1e2193d199b998cac4e7f4d48
BLAKE2b-256 8a8d7514b8044f275e205551e777f2847dd1a998d4ac5a6240a644185209eb1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page