Cued Speech Processing Tools - Decode and Generate cued speech videos
Project description
Cued Speech Processing Tools
Python package for decoding and generating cued speech videos with MediaPipe and deep learning.
Features
- Decoder: Convert cued speech videos to text with subtitles using neural networks and language models
- Generator: Create cued speech videos from text with automatic hand gesture overlay
- Automatic Data Management: Downloads required models and data automatically
Installation
Prerequisites
- Python 3.11.*
- Pixi (for Montreal Forced Aligner)
Setup Steps
- Install Pixi
# macOS/Linux
curl -fsSL https://pixi.sh/install.sh | bash
# Windows PowerShell
irm https://pixi.sh/install.ps1 | iex
- Create Pixi environment
mkdir cued-speech-env && cd cued-speech-env
pixi init
pixi add "python==3.11"
pixi add montreal-forced-aligner=3.3.4
- Install package
pixi run python -m pip install cued-speech
- Download data and setup MFA models
pixi shell
cued-speech download-data
pixi run mfa models save acoustic download/french_mfa.zip --overwrite
pixi run mfa models save dictionary download/french_mfa.dict --overwrite
- Verify installation
cued-speech --help
Quick Start
Decode Video (Cued Speech → Text)
# Basic usage with default parameters, we use the provided test video
cued-speech decode
# Custom video
cued-speech decode --video_path /path/to/video.mp4
Generate Video (Text → Cued Speech)
# Text extracted automatically from video audio
cued-speech generate input_video.mp4
# Skip Whisper
cued-speech generate video.mp4 --skip-whisper --text "Votre texte ici"
Command Line Options
Decoder
Core Options:
--video_path PATH- Input video (default:download/test_decode.mp4)--output_path PATH- Output video (default:output/decoder/decoded_video.mp4)--right_speaker [True|False]- Speaker handedness (default:True)--auto_download [True|False]- Auto-download data (default:True)
Model Paths (optional):
--model_path PATH- TFLite CTC model (default:download/cuedspeech_model_fixed_temporal.tflite)--vocab_path PATH- Phoneme vocabulary--face_tflite PATH- Face landmark model (default:download/face_landmarker.task)--hand_tflite PATH- Hand landmark model (default:download/hand_landmarker.task)--pose_tflite PATH- Pose landmark model (default:download/pose_landmarker_full.task)
Generator
Options:
VIDEO_PATH(required) - Input video file--text TEXT- Manual text input (optional, otherwise extracted from audio)--output_path PATH- Output video (default:output/generator/generated_cued_speech.mp4)--language LANG- Language (default:french)--skip-whisper- Skip Whisper transcription (requires--text)--easing TYPE- Animation easing:linear,ease_in_out_cubic,ease_out_elastic,ease_in_out_back--morphing/--no-morphing- Hand shape morphing (default: enabled)--transparency/--no-transparency- Transparency effects (default: enabled)--curving/--no-curving- Curved trajectories (default: enabled)
Python API
Decoder
from cued_speech import decode_video
decode_video(
video_path="input.mp4",
right_speaker=True,
output_path="output/decoder/"
)
Generator
from cued_speech import generate_cue
import whisper
# Automatic text extraction
model = whisper.load_model("medium", download_root="download")
result_path = generate_cue(
text=None, # Extracted from video
video_path="video.mp4",
output_path="output/generator/",
config={
"model": model, # Optional preloaded Whisper model
"language": "french",
"easing_function": "ease_in_out_cubic",
"enable_morphing": True,
"enable_transparency": True,
"enable_curving": True,
}
)
# With manual text
result_path = generate_cue(
text="Bonjour tout le monde",
video_path="video.mp4",
output_path="output/generator/",
config={"skip_whisper": True}
)
Data Management
# Download all required data
cued-speech download-data
# List available data
cued-speech list-data
# Clean up data
cued-speech cleanup-data --confirm
Downloaded Files
Data is stored in ./download/:
Decoder:
cuedspeech_model_fixed_temporal.tflite- TFLite CTC model (100-frame fixed temporal window)phonelist.csv,lexicon.txt- Vocabularieskenlm_fr.bin,kenlm_ipa.binary- Language modelshomophones_dico.jsonl- Homophone dictionaryface_landmarker.task- Face landmarks (478 points, 3.6 MB, float16)hand_landmarker.task- Hand landmarks (21 points/hand, 7.5 MB, float16)pose_landmarker_full.task- Pose landmarks (33 points, 9.0 MB, float16, FULL complexity)
Generator:
rotated_images/- Hand shape imagesfrench_mfa.dict,french_mfa.zip- MFA models
Test Files:
test_decode.mp4,test_generate.mp4
Architecture
Decoder
- MediaPipe Tasks API: Latest float16 models for landmark detection (.task files)
- TFLite CTC Model: Three-stream fusion encoder (hand shape, position, lips) with 100-frame fixed temporal window
- CTC Decoder: Phoneme recognition with KenLM beam search
- Language Model: KenLM for French sentence correction
- Real-time Processing: Overlap-save windowing for streaming inference
Generator
- Whisper: Speech-to-text transcription
- MFA: Montreal Forced Alignment for phoneme timing
- Dynamic Scaling: Hand size automatically adapts to face width
- Hand Rendering: MediaPipe-based hand landmark detection for accurate positioning
Notes
- Models designed for 30 FPS videos
- Hand size automatically scales based on detected face width
- Decoder uses MediaPipe Tasks API (
.taskfiles) for landmark detection - CTC model uses TFLite with 100-frame fixed temporal window for optimal performance
License
MIT License - see LICENSE file
Support
Contact: boubasow.pro@gmail.com
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cued_speech-0.4.2.tar.gz.
File metadata
- Download URL: cued_speech-0.4.2.tar.gz
- Upload date:
- Size: 166.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7120e50920a0fbc8b85e56d7eb50c982b6ee24e86336d051aaf80bdbb2f9b0c3
|
|
| MD5 |
5a6b9952585d06c4bcf4b97c05146909
|
|
| BLAKE2b-256 |
20c4430751b324e8432b75ee3479b477a973be79c10a4e0c6a8e4c11d46e0d8d
|
File details
Details for the file cued_speech-0.4.2-py3-none-any.whl.
File metadata
- Download URL: cued_speech-0.4.2-py3-none-any.whl
- Upload date:
- Size: 155.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c2e4851fa07fdc7019d1e98a26995c2005ea01e61cabe4d1bdc6b276a27b6d5
|
|
| MD5 |
decee2d1e2193d199b998cac4e7f4d48
|
|
| BLAKE2b-256 |
8a8d7514b8044f275e205551e777f2847dd1a998d4ac5a6240a644185209eb1a
|