Cued Speech Processing Tools - Decode and Generate cued speech videos

These details have not been verified by PyPI

Project links

Project description

Cued Speech Processing Tools

Python package for decoding and generating cued speech videos with MediaPipe and deep learning.

Features

Decoder: Convert cued speech videos to text with subtitles using neural networks and language models
Generator: Create cued speech videos from text with automatic hand gesture overlay
Automatic Data Management: Downloads required models and data automatically

Installation

Prerequisites

Python 3.11.*
Pixi (for Montreal Forced Aligner)

Setup Steps

Install Pixi

# macOS/Linux
curl -fsSL https://pixi.sh/install.sh | bash

# Windows PowerShell
irm https://pixi.sh/install.ps1 | iex

Create Pixi environment

mkdir cued-speech-env && cd cued-speech-env
pixi init
pixi add "python==3.11"
pixi add montreal-forced-aligner=3.3.4

Install package

pixi run python -m pip install cued-speech

Download data and setup MFA models

pixi shell
cued-speech download-data
pixi run mfa models save acoustic download/french_mfa.zip --overwrite
pixi run mfa models save dictionary download/french_mfa.dict --overwrite

Verify installation

cued-speech --help

Quick Start

Decode Video (Cued Speech → Text)

# Basic usage with default parameters, we use the provided test video
cued-speech decode

# Custom video
cued-speech decode --video_path /path/to/video.mp4

Generate Video (Text → Cued Speech)

# Text extracted automatically from video audio
cued-speech generate input_video.mp4

# Skip Whisper 
cued-speech generate video.mp4 --skip-whisper --text "Votre texte ici"

Command Line Options

Decoder

Core Options:

--video_path PATH - Input video (default: download/test_decode.mp4)
--output_path PATH - Output video (default: output/decoder/decoded_video.mp4)
--right_speaker [True|False] - Speaker handedness (default: True)
--auto_download [True|False] - Auto-download data (default: True)

Model Paths (optional):

--model_path PATH - TFLite CTC model (default: download/cuedspeech_model_fixed_temporal.tflite)
--vocab_path PATH - Phoneme vocabulary
--face_tflite PATH - Face landmark model (default: download/face_landmarker.task)
--hand_tflite PATH - Hand landmark model (default: download/hand_landmarker.task)
--pose_tflite PATH - Pose landmark model (default: download/pose_landmarker_full.task)

Generator

Options:

VIDEO_PATH (required) - Input video file
--text TEXT - Manual text input (optional, otherwise extracted from audio)
--output_path PATH - Output video (default: output/generator/generated_cued_speech.mp4)
--language LANG - Language (default: french)
--skip-whisper - Skip Whisper transcription (requires --text)
--easing TYPE - Animation easing: linear, ease_in_out_cubic, ease_out_elastic, ease_in_out_back
--morphing/--no-morphing - Hand shape morphing (default: enabled)
--transparency/--no-transparency - Transparency effects (default: enabled)
--curving/--no-curving - Curved trajectories (default: enabled)

Python API

Decoder

from cued_speech import decode_video

decode_video(
    video_path="input.mp4",
    right_speaker=True,
    output_path="output/decoder/"
)

Generator

from cued_speech import generate_cue
import whisper

# Automatic text extraction
model = whisper.load_model("medium", download_root="download")
result_path = generate_cue(
    text=None,  # Extracted from video
    video_path="video.mp4",
    output_path="output/generator/",
    config={
        "model": model,  # Optional preloaded Whisper model
        "language": "french",
        "easing_function": "ease_in_out_cubic",
        "enable_morphing": True,
        "enable_transparency": True,
        "enable_curving": True,
    }
)

# With manual text
result_path = generate_cue(
    text="Bonjour tout le monde",
    video_path="video.mp4",
    output_path="output/generator/",
    config={"skip_whisper": True}
)

Data Management

# Download all required data
cued-speech download-data

# List available data
cued-speech list-data

# Clean up data
cued-speech cleanup-data --confirm

Downloaded Files

Data is stored in ./download/:

Decoder:

cuedspeech_model_fixed_temporal.tflite - TFLite CTC model (100-frame fixed temporal window)
phonelist.csv, lexicon.txt - Vocabularies
kenlm_fr.bin, kenlm_ipa.binary - Language models
homophones_dico.jsonl - Homophone dictionary
face_landmarker.task - Face landmarks (478 points, 3.6 MB, float16)
hand_landmarker.task - Hand landmarks (21 points/hand, 7.5 MB, float16)
pose_landmarker_full.task - Pose landmarks (33 points, 9.0 MB, float16, FULL complexity)

Generator:

rotated_images/ - Hand shape images
french_mfa.dict, french_mfa.zip - MFA models

Test Files:

test_decode.mp4, test_generate.mp4

Architecture

Decoder

MediaPipe Tasks API: Latest float16 models for landmark detection (.task files)
TFLite CTC Model: Three-stream fusion encoder (hand shape, position, lips) with 100-frame fixed temporal window
CTC Decoder: Phoneme recognition with KenLM beam search
Language Model: KenLM for French sentence correction
Real-time Processing: Overlap-save windowing for streaming inference

Generator

Whisper: Speech-to-text transcription
MFA: Montreal Forced Alignment for phoneme timing
Dynamic Scaling: Hand size automatically adapts to face width
Hand Rendering: MediaPipe-based hand landmark detection for accurate positioning

Notes

Models designed for 30 FPS videos
Hand size automatically scales based on detected face width
Decoder uses MediaPipe Tasks API (.task files) for landmark detection
CTC model uses TFLite with 100-frame fixed temporal window for optimal performance

License

MIT License - see LICENSE file

Support

Contact: boubasow.pro@gmail.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.2

Nov 3, 2025

This version

0.4.1

Oct 29, 2025

0.4.0

Oct 22, 2025

0.3.6

Oct 16, 2025

0.3.5

Oct 7, 2025

0.3.4

Oct 7, 2025

0.3.3

Oct 6, 2025

0.3.2

Oct 6, 2025

0.3.1

Oct 6, 2025

0.3.0

Oct 6, 2025

0.2.59

Oct 3, 2025

0.2.58

Oct 3, 2025

0.2.54

Oct 2, 2025

0.2.53

Oct 2, 2025

0.2.52

Oct 2, 2025

0.2.51

Oct 2, 2025

0.2.5

Oct 2, 2025

0.2.0

Oct 1, 2025

0.1.0

Sep 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cued_speech-0.4.1.tar.gz (152.8 kB view details)

Uploaded Oct 29, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cued_speech-0.4.1-py3-none-any.whl (156.5 kB view details)

Uploaded Oct 29, 2025 Python 3

File details

Details for the file cued_speech-0.4.1.tar.gz.

File metadata

Download URL: cued_speech-0.4.1.tar.gz
Upload date: Oct 29, 2025
Size: 152.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cued_speech-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`116895c6417c0beffdb60af9af53a9e0328de7beb7e42efeee46cce234428c26`
MD5	`d1b4e22150bc404050acc07ca635d824`
BLAKE2b-256	`d3fa0e9b5a38bd661b8a8535bc47d2891a0e36bf63fe979fec639ca3cfa95c8d`

See more details on using hashes here.

File details

Details for the file cued_speech-0.4.1-py3-none-any.whl.

File metadata

Download URL: cued_speech-0.4.1-py3-none-any.whl
Upload date: Oct 29, 2025
Size: 156.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for cued_speech-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`85eb85e9dd549a0b783865563275780ceaab69ad8e6a55db196516a136cc4611`
MD5	`f79bea49ece483a665d0e15250e47541`
BLAKE2b-256	`97b34732585317f84a407dc79e463ad876d7b294b9c1fd65aafd65ea50ce39dc`

See more details on using hashes here.

cued-speech 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Cued Speech Processing Tools

Features

Installation

Prerequisites

Setup Steps

Quick Start

Decode Video (Cued Speech → Text)

Generate Video (Text → Cued Speech)

Command Line Options

Decoder

Generator

Python API

Decoder

Generator

Data Management

Downloaded Files

Architecture

Decoder

Generator

Notes

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes