Skip to main content

Sign Language Toolkit for sign language research

Project description

PyPI Python License Tests Docs

Sign Language Toolkit (SLTK)

| Tutorials | Documentation | Notebooks | Contributing | PyPI |

What SLTK Offers

  • SLTK is an open-source Python toolkit that accelerates sign language research and makes SOTA computer vision tools and resources easily availble to linguists and other people working in Sign Language and AI.

  • It provides a complete pipeline for pose extraction, sign segmentation, gloss spotting, non-manual signal detection, and evaluation — all with a single CLI or Python API.

  • SLTK brings together state-of-the-art models (WiLoR, NLF, TEASER, SignRep) behind a unified interface, so researchers can focus on linguistics rather than engineering.

Vision

  • We believe the field needs a holistic toolkit that jointly supports the full research workflow: video processing, pose extraction, temporal analysis, and standardized evaluation.

  • SLTK is designed for reproducibility: such that every metric, extraction step, and analysis tool uses the same data structures and can be run from a single CLI command or Python script.

Quick Start

Installation

# Core library (ELAN I/O, CLI, dataset loaders) — no GPU needed
pip install signlangtk

# With GPU extraction backends
pip install "signlangtk[wilor,nlf,teaser]"

# Everything
pip install "signlangtk[all]"

Package name: signlangtk on PyPI, sltk for Python imports.

Development install
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,wilor,nlf,teaser]"
pytest  # ~947 tests

One-Command Pipeline

Example sample data ships in sample-data/: a 7s How2Sign continuous clip (sample-data/example.mp4) and 59 isolated ASL Citizen sign clips for the gloss-spotting demo (sample-data/spotting_dictionary/). See sample-data/README.md.

sltk pipeline video.mp4 -o output/
# → output/video.eaf  (10+ tiers: segmentation, blinks, head nods, mouth, gaze, ...)

Python API

from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms
from sltk.io.elan_roundtrip import ElanDocument

VIDEO, FPS = "recording.mp4", 25.0

# ── Step 1: Extract 3D hand poses ─────────────────────────────────
with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video(VIDEO, "recording_wilor.h5")
    print(f"{result.num_detections} hand detections across {result.num_frames} frames")

# ── Step 2: Extract face parameters ───────────────────────────────
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 3: Segment signs ─────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")  # (T, 192) MANO features
labels = get_runner().predict(features)            # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 4: Detect non-manual signals ─────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5", detectors={"all"}
)

# ── Step 5: Assemble multi-tier ELAN file ─────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

doc.add_tier("Segmentation")
for s, e in segments:
    doc.add_segment("Segmentation", s / FPS, e / FPS, "SIGN")

for tier in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT", "MOUTH-MOVEMENT"]:
    doc.add_tier(tier)
for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")
for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording.eaf")

Extraction

Three GPU extractors share the same interface: load_model()extract_from_video(). Weights auto-download from HuggingFace Hub on first use (~3.4 GB total).

WiLoR — 3D Hand Reconstruction

21 keypoints per hand with MANO rotation matrices. Primary input for sign segmentation.

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,       # reduce for <8GB VRAM
    detection_confidence=0.3,
    use_amp=True,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")

Output H5:

video_wilor.h5
├── attrs: fps, num_frames, resolution
├── img_idx        (M,)              # frame index per detection
├── kpts_3d        (M, 21, 3)       # 3D hand keypoints
├── kpts_2d        (M, 21, 2)       # 2D projections
├── right          (M,)             # True = right hand
├── confidence     (M,)             # detection score
├── bboxes         (M, 4)           # hand bounding boxes
└── mano/
    ├── hand_pose      (M, 15, 3, 3)   # joint rotations
    ├── global_orient  (M, 1, 3, 3)    # wrist rotation
    └── betas          (M, 10)          # shape parameters

NLF — Full-Body SMPL-X

55 SMPL-X joints (body + hands + face) with full pose parameters.

from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")

TEASER — FLAME Face Parameters

FLAME 3D face parameters: jaw pose, expression, eyelid, shape, and head pose. Uses MediaPipe for face detection. This is the input for NMS detection.

from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")

Batch Processing

from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "hands.h5")
    print(f"{result.num_detections} hands across {result.num_frames} frames")

Sign Segmentation

A 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels.

# CLI
sltk segment video_wilor.h5 -o segments.eaf -f elan --video video.mp4

# Or directly from video (auto-extracts WiLoR first)
sltk segment video.mp4 -o segments.eaf -f elan
# Python
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

segment_h5("video_wilor.h5", output_path="segments.eaf",
           output_format=OutputFormat.ELAN, fps=25.0, media_path="video.mp4")

Gloss Spotting

Match detected segments to a dictionary of known signs using SignRep embeddings (768-dim ViT features).

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=5,
)

for seg in result.segments:
    for gl in seg.top_glosses[:3]:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")

Non-Manual Signal Detection

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face data.

# CLI
sltk nms video_teaser.h5 -o nms_output/
from sltk.nms.runner import detect_nms

blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},
    smpl_path="video_nlf.h5",  # optional: enables gaze detection
)
print(f"{len(blinks)} blinks, {len(nms_events)} NMS events")
print(f"Tracking quality: {quality.detection_rate:.0%}")
Detector ELAN Tier Signal Source Data
Blink BLINK Eye closures TEASER eyelid
Nod HEAD-NOD Vertical head oscillation TEASER head pitch
Shake HEAD-SHAKE Horizontal head oscillation TEASER head yaw
Tilt HEAD-TILT Side-to-side tilt TEASER head roll
Mouth MOUTH-MOVEMENT Lip and mouth movement FLAME expression
Eyebrow EYEBROW-RAISE Eyebrow raise or furrow FLAME expression
Gaze EYE-GAZE Gaze direction NLF eye pose
Squint EYE-SQUINT Partial eye closure TEASER eyelid

Evaluation Metrics

Standardized metrics for translation, production (pose & video), and segmentation evaluation. See the Metrics tutorial and notebook.

Task Metrics Dependencies
Translation BLEU-1/2/3/4, ROUGE-L, chrF/chrF++, TER, METEOR, WER signlangtk[metrics]
Translation (neural) BLEURT, BERTScore signlangtk[metrics-neural]
Production — Pose MPJPE, PA-MPJPE, PCK, DTW-MJE, APE, FGD core (numpy/scipy)
Production — Video SSIM, PSNR, FID signlangtk[metrics-video]
Segmentation Boundary F1, IoU, frame accuracy, label P/R/F1, confusion matrix core

ROI Cropping

GPU-accelerated region-of-interest cropping for lips, hands, and other body regions. See the Cropping API.

from sltk.cropping import crop_lips_from_video

crops, frame_indices, bboxes = crop_lips_from_video("video.mp4", output_size=96)

Visualization

Skeleton and 3D mesh overlay rendering on video frames. See the Visualization API.

from sltk.visualization import generate_overlay_video
generate_overlay_video("video.mp4", "video_wilor.h5", "overlay.mp4", viz_type="wilor")

ELAN I/O

Feature Description
Read/Write Full ELAN (.eaf) round-trip preserving all XML structure
Create Build multi-tier ELAN files programmatically
Merge Combine tiers from multiple ELAN files
Export Convert to/from JSON, CSV, and other formats

See the ELAN tutorial.

Supported Datasets

Built-in loaders for major sign language datasets:

Dataset Language Type Size Tutorial
WLASL ASL Isolated 2,000 classes Datasets
ASL-Citizen ASL Isolated Community-sourced Datasets
How2Sign ASL Continuous 35K sentences Datasets
BSLCP BSL Continuous Multi-view corpus Datasets
BOBSL BSL Continuous BBC archive Datasets
Phoenix-2014T DGS Continuous Weather broadcasts Datasets
CSL-Daily CSL Continuous Daily conversations Datasets

Additional Features

  • Pipeline CLI: Single command from raw video to multi-tier ELAN file (sltk pipeline)
  • Unified Pose Format: PoseSequence class normalizes all backends to (T, N, C) arrays with format conversion
  • ROI Cropping: GPU-accelerated lip, hand, and generic region cropping for data loading and preprocessing
  • Visualization: Skeleton overlay and 3D mesh projection rendering
  • Batch Processing: All extractors support directory-level batch processing
  • Model Weight Management: Auto-download from HuggingFace Hub with environment variable overrides
  • Modular Install: Optional dependency groups ([wilor], [metrics], [nlf], etc.) keep the base lightweight

CLI Reference

# Full pipeline
sltk pipeline video.mp4 -o output/

# Pose extraction
sltk extract wilor video.mp4 -o hands.h5
sltk extract nlf video.mp4 -o body.h5
sltk extract teaser video.mp4 -o face.h5

# Sign segmentation
sltk segment hands.h5 -o segments.eaf -f elan --video video.mp4

# Non-manual signal detection
sltk nms face.h5 -o nms_output/

# Gloss spotting
sltk spot video.mp4 --segments segments.json --dictionary dict/

# Evaluation
sltk evaluate preds.txt refs.txt --task translation -m bleu4 -m chrf
sltk evaluate pred.h5 ref.h5 --task production -m mpjpe -m pck
sltk evaluate preds.eaf refs.eaf --task segmentation

# ELAN utilities
sltk to-elan segments.json --video video.mp4 -o annotations.eaf
sltk from-elan annotations.eaf -o segments.json --tier Gloss
sltk info video_wilor.h5

See the full CLI documentation.

Tutorials & Notebooks

# Tutorial Notebook Description
00 Notebook Foundations: core data types, H5 schemas, the extractor contract (start here)
01 Pose Extraction Notebook WiLoR hands, NLF body, TEASER face, MediaPipe
02 Segmentation & Spotting Notebook Sign boundaries, dictionary matching
03 NMS & ELAN Notebook Non-manual signals, ELAN file assembly
04 Evaluation Metrics Notebook Translation, production, segmentation metrics
05 ELAN Files Reading, writing, merging annotation files
06 Datasets Loading and exploring sign language corpora
07 Feature Processing Notebook Pose features and normalization
10 Extending SLTK Notebook Add a segmenter, spotter, extractor, metric, format, detector, …

Model Weights

All weights auto-download from HuggingFace Hub and cache at ~/.cache/sltk/weights/.

Model Size Env Override
WiLoR (hands) ~2.5 GB SLTK_WILOR_CHECKPOINT
NLF (body) ~540 MB SLTK_NLF_MODEL
TEASER (face) ~350 MB SLTK_TEASER_CHECKPOINT
SignRep (embedding) ~350 MB SLTK_SIGNREP_CHECKPOINT
Segmenter ~180 MB SLTK_SEGMENTOR_V2_CHECKPOINT

Set SLTK_AUTO_DOWNLOAD=1 to skip the confirmation prompt. Set SLTK_WEIGHTS_DIR to override the cache location.

Testing

pytest                       # Full suite (~947 tests)
pytest -m "not slow"         # Skip slow GPU/neural tests
pytest tests/test_metrics.py # Single module

Documentation

Full documentation at sign-language-toolkit.readthedocs.io

Section Content
Installation Install options, GPU setup, weight management
Quick Start First steps with SLTK
API Reference Full Python API docs
Cropping ROI and lip cropping
NMS Detection Non-manual signal detection API
Visualization Skeleton and mesh overlays
CLI Reference Command-line interface

License

CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International)

Third-Party Licenses

SLTK bundles or depends on models with their own license terms:

Component License Link
MANO (hand model) Non-commercial mano.is.tue.mpg.de
SMPL-X (body model) Non-commercial smpl-x.is.tue.mpg.de
FLAME (face model) Non-commercial / CC-BY-4.0 (2023+) flame.is.tue.mpg.de
WiLoR Apache 2.0 Potamias et al., CVPR 2025
TEASER See repository Liu et al., ICLR 2025
NLF MIT (non-commercial) Sárándi & Pons-Moll, NeurIPS 2024

Citations & Acknowledgements

SLTK integrates several third-party models and methods. If you use these components in your research, please cite the original papers. SLTK does not claim authorship of these models — it provides a unified interface to run them together.

Sign Segmentation — He et al., FG 2025

The sign segmenter uses the Hands-On model for temporal sign boundary detection from hand pose features.

@inproceedings{he2025hands,
  title={Hands-On: Segmenting Individual Signs from Continuous Sequences},
  author={He, Low Jian and Walsh, Harry and Sincan, Ozge Mercanoglu and Bowden, Richard},
  booktitle={2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

Gloss Spotting — Wong et al., ICCV 2025

SignRep provides self-supervised sign language representations used for dictionary-based gloss matching.

@inproceedings{wong2025signrep,
  title={SignRep: Enhancing Self-Supervised Sign Representations},
  author={Wong, Ryan and Camgoz, Necati Cihan and Bowden, Richard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={22804--22814},
  year={2025}
}

Face Reconstruction — Liu et al., ICLR 2025

TEASER provides FLAME-based face parameter extraction (jaw pose, expression, eyelid) used for non-manual signal detection.

@article{liu2025teaser,
  title={TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction},
  author={Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Hand Reconstruction — Potamias et al., CVPR 2025

WiLoR provides end-to-end 3D hand localization and MANO parameter estimation from in-the-wild images.

@inproceedings{potamias2025wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Potamias, Rolandos Alexandros and Zhang, Jinglei and Deng, Jiankang and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Body Pose — Sárándi & Pons-Moll, NeurIPS 2024

NLF estimates continuous 3D human body pose and shape (SMPL-X parameters, including eye gaze).

@inproceedings{sarandi2024nlf,
  title={Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation},
  author={S{\'a}r{\'a}ndi, Istv{\'a}n and Pons-Moll, Gerard},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Pose Estimation — Jiang et al., 2023

RTMPose provides real-time multi-person whole-body keypoint detection (133 COCO-WholeBody landmarks).

@article{jiang2023rtmpose,
  title={RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose},
  author={Jiang, Tao and Lu, Peng and Zhang, Li and Ma, Ningsheng and Han, Rui and Lyu, Chengqi and Li, Yining and Chen, Kai},
  journal={arXiv preprint arXiv:2303.07399},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.2.0.tar.gz (652.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signlangtk-0.2.0-py3-none-any.whl (636.1 kB view details)

Uploaded Python 3

File details

Details for the file signlangtk-0.2.0.tar.gz.

File metadata

  • Download URL: signlangtk-0.2.0.tar.gz
  • Upload date:
  • Size: 652.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signlangtk-0.2.0.tar.gz
Algorithm Hash digest
SHA256 217f791f6bda6063c364f7334032c52ff0a2ecb04b8fc0927c82453968563e6f
MD5 111545af5e22d9877f62564a74fda821
BLAKE2b-256 239d772449ab7dd4fc1e70f5b1c11afdb94fd41d8d8d07bda40538331bd51618

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.2.0.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signlangtk-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: signlangtk-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 636.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for signlangtk-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1dc2d25676045764c16c7311b9b5b37dafca58635c88ba242cfba4e67be9f9bb
MD5 a6ab2ec69a0c22a4e9cd2809e8ea87d5
BLAKE2b-256 723f1d5f1c429e0a40e2f081dfc6824713acbef40265ad924c6de78997d22fe5

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.2.0-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page