Sign Language Toolkit for sign language research
Project description
Sign Language Toolkit (SLTK)
| Tutorials | Documentation | Notebooks | Contributing | PyPI |
What SLTK Offers
-
SLTK is an open-source Python toolkit that accelerates sign language research and makes SOTA computer vision tools and resources easily availble to linguists and other people working in Sign Language and AI.
-
It provides a complete pipeline for pose extraction, sign segmentation, gloss spotting, non-manual signal detection, and evaluation — all with a single CLI or Python API.
-
SLTK brings together state-of-the-art models (WiLoR, NLF, TEASER, SignRep) behind a unified interface, so researchers can focus on linguistics rather than engineering.
Vision
-
We believe the field needs a holistic toolkit that jointly supports the full research workflow: video processing, pose extraction, temporal analysis, and standardized evaluation.
-
SLTK is designed for reproducibility: such that every metric, extraction step, and analysis tool uses the same data structures and can be run from a single CLI command or Python script.
Quick Start
Installation
# Core library (ELAN I/O, CLI, dataset loaders) — no GPU needed
pip install signlangtk
# With GPU extraction backends
pip install "signlangtk[wilor,nlf,teaser]"
# Everything
pip install "signlangtk[all]"
Package name:
signlangtkon PyPI,sltkfor Python imports.
Development install
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,wilor,nlf,teaser]"
pytest # ~947 tests
One-Command Pipeline
Example sample data ships in sample-data/: a 7s How2Sign continuous clip
(sample-data/example.mp4) and 59 isolated ASL Citizen sign clips for the
gloss-spotting demo (sample-data/spotting_dictionary/). See
sample-data/README.md.
sltk pipeline video.mp4 -o output/
# → output/video.eaf (10+ tiers: segmentation, blinks, head nods, mouth, gaze, ...)
Python API
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms
from sltk.io.elan_roundtrip import ElanDocument
VIDEO, FPS = "recording.mp4", 25.0
# ── Step 1: Extract 3D hand poses ─────────────────────────────────
with WiLoRExtractor() as ext:
ext.load_model()
result = ext.extract_from_video(VIDEO, "recording_wilor.h5")
print(f"{result.num_detections} hand detections across {result.num_frames} frames")
# ── Step 2: Extract face parameters ───────────────────────────────
with TeaserExtractor() as ext:
ext.load_model()
ext.extract_from_video(VIDEO, "recording_teaser.h5")
# ── Step 3: Segment signs ─────────────────────────────────────────
features = h5_to_features("recording_wilor.h5") # (T, 192) MANO features
labels = get_runner().predict(features) # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels) # [(start, end), ...]
# ── Step 4: Detect non-manual signals ─────────────────────────────
blinks, nms_events, quality = detect_nms(
"recording_teaser.h5", detectors={"all"}
)
# ── Step 5: Assemble multi-tier ELAN file ─────────────────────────
doc = ElanDocument.new(video_path=VIDEO)
doc.add_tier("Segmentation")
for s, e in segments:
doc.add_segment("Segmentation", s / FPS, e / FPS, "SIGN")
for tier in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT", "MOUTH-MOVEMENT"]:
doc.add_tier(tier)
for b in blinks:
doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")
for ev in nms_events:
doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)
doc.save("recording.eaf")
Extraction
Three GPU extractors share the same interface: load_model() → extract_from_video(). Weights auto-download from HuggingFace Hub on first use (~3.4 GB total).
WiLoR — 3D Hand Reconstruction
21 keypoints per hand with MANO rotation matrices. Primary input for sign segmentation.
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig
config = WiLoRConfig(
device="cuda:0",
img_batch_size=128, # reduce for <8GB VRAM
detection_confidence=0.3,
use_amp=True,
)
with WiLoRExtractor(config) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_wilor.h5")
Output H5:
video_wilor.h5
├── attrs: fps, num_frames, resolution
├── img_idx (M,) # frame index per detection
├── kpts_3d (M, 21, 3) # 3D hand keypoints
├── kpts_2d (M, 21, 2) # 2D projections
├── right (M,) # True = right hand
├── confidence (M,) # detection score
├── bboxes (M, 4) # hand bounding boxes
└── mano/
├── hand_pose (M, 15, 3, 3) # joint rotations
├── global_orient (M, 1, 3, 3) # wrist rotation
└── betas (M, 10) # shape parameters
NLF — Full-Body SMPL-X
55 SMPL-X joints (body + hands + face) with full pose parameters.
from sltk.extraction.nlf import NLFExtractor, NLFConfig
with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_nlf.h5")
TEASER — FLAME Face Parameters
FLAME 3D face parameters: jaw pose, expression, eyelid, shape, and head pose. Uses MediaPipe for face detection. This is the input for NMS detection.
from sltk.extraction.teaser import TeaserExtractor, TeaserConfig
with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_teaser.h5")
Batch Processing
from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor
with WiLoRExtractor() as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "hands.h5")
print(f"{result.num_detections} hands across {result.num_frames} frames")
Sign Segmentation
A 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels.
# CLI
sltk segment video_wilor.h5 -o segments.eaf -f elan --video video.mp4
# Or directly from video (auto-extracts WiLoR first)
sltk segment video.mp4 -o segments.eaf -f elan
# Python
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat
segment_h5("video_wilor.h5", output_path="segments.eaf",
output_format=OutputFormat.ELAN, fps=25.0, media_path="video.mp4")
Gloss Spotting
Match detected segments to a dictionary of known signs using SignRep embeddings (768-dim ViT features).
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])
result = pipeline.spot(
features=continuous,
segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
dictionary=dictionary,
top_k=5,
)
for seg in result.segments:
for gl in seg.top_glosses[:3]:
print(f" {gl['gloss']} ({gl['similarity']:.3f})")
Non-Manual Signal Detection
Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face data.
# CLI
sltk nms video_teaser.h5 -o nms_output/
from sltk.nms.runner import detect_nms
blinks, nms_events, quality = detect_nms(
"video_teaser.h5",
detectors={"all"},
smpl_path="video_nlf.h5", # optional: enables gaze detection
)
print(f"{len(blinks)} blinks, {len(nms_events)} NMS events")
print(f"Tracking quality: {quality.detection_rate:.0%}")
| Detector | ELAN Tier | Signal | Source Data |
|---|---|---|---|
| Blink | BLINK |
Eye closures | TEASER eyelid |
| Nod | HEAD-NOD |
Vertical head oscillation | TEASER head pitch |
| Shake | HEAD-SHAKE |
Horizontal head oscillation | TEASER head yaw |
| Tilt | HEAD-TILT |
Side-to-side tilt | TEASER head roll |
| Mouth | MOUTH-MOVEMENT |
Lip and mouth movement | FLAME expression |
| Eyebrow | EYEBROW-RAISE |
Eyebrow raise or furrow | FLAME expression |
| Gaze | EYE-GAZE |
Gaze direction | NLF eye pose |
| Squint | EYE-SQUINT |
Partial eye closure | TEASER eyelid |
Evaluation Metrics
Standardized metrics for translation, production (pose & video), and segmentation evaluation. See the Metrics tutorial and notebook.
| Task | Metrics | Dependencies |
|---|---|---|
| Translation | BLEU-1/2/3/4, ROUGE-L, chrF/chrF++, TER, METEOR, WER | signlangtk[metrics] |
| Translation (neural) | BLEURT, BERTScore | signlangtk[metrics-neural] |
| Production — Pose | MPJPE, PA-MPJPE, PCK, DTW-MJE, APE, FGD | core (numpy/scipy) |
| Production — Video | SSIM, PSNR, FID | signlangtk[metrics-video] |
| Segmentation | Boundary F1, IoU, frame accuracy, label P/R/F1, confusion matrix | core |
ROI Cropping
GPU-accelerated region-of-interest cropping for lips, hands, and other body regions. See the Cropping API.
from sltk.cropping import crop_lips_from_video
crops, frame_indices, bboxes = crop_lips_from_video("video.mp4", output_size=96)
Visualization
Skeleton and 3D mesh overlay rendering on video frames. See the Visualization API.
from sltk.visualization import generate_overlay_video
generate_overlay_video("video.mp4", "video_wilor.h5", "overlay.mp4", viz_type="wilor")
ELAN I/O
| Feature | Description |
|---|---|
| Read/Write | Full ELAN (.eaf) round-trip preserving all XML structure |
| Create | Build multi-tier ELAN files programmatically |
| Merge | Combine tiers from multiple ELAN files |
| Export | Convert to/from JSON, CSV, and other formats |
See the ELAN tutorial.
Supported Datasets
Built-in loaders for major sign language datasets:
| Dataset | Language | Type | Size | Tutorial |
|---|---|---|---|---|
| WLASL | ASL | Isolated | 2,000 classes | Datasets |
| ASL-Citizen | ASL | Isolated | Community-sourced | Datasets |
| How2Sign | ASL | Continuous | 35K sentences | Datasets |
| BSLCP | BSL | Continuous | Multi-view corpus | Datasets |
| BOBSL | BSL | Continuous | BBC archive | Datasets |
| Phoenix-2014T | DGS | Continuous | Weather broadcasts | Datasets |
| CSL-Daily | CSL | Continuous | Daily conversations | Datasets |
Additional Features
- Pipeline CLI: Single command from raw video to multi-tier ELAN file (
sltk pipeline) - Unified Pose Format:
PoseSequenceclass normalizes all backends to(T, N, C)arrays with format conversion - ROI Cropping: GPU-accelerated lip, hand, and generic region cropping for data loading and preprocessing
- Visualization: Skeleton overlay and 3D mesh projection rendering
- Batch Processing: All extractors support directory-level batch processing
- Model Weight Management: Auto-download from HuggingFace Hub with environment variable overrides
- Modular Install: Optional dependency groups (
[wilor],[metrics],[nlf], etc.) keep the base lightweight
CLI Reference
# Full pipeline
sltk pipeline video.mp4 -o output/
# Pose extraction
sltk extract wilor video.mp4 -o hands.h5
sltk extract nlf video.mp4 -o body.h5
sltk extract teaser video.mp4 -o face.h5
# Sign segmentation
sltk segment hands.h5 -o segments.eaf -f elan --video video.mp4
# Non-manual signal detection
sltk nms face.h5 -o nms_output/
# Gloss spotting
sltk spot video.mp4 --segments segments.json --dictionary dict/
# Evaluation
sltk evaluate preds.txt refs.txt --task translation -m bleu4 -m chrf
sltk evaluate pred.h5 ref.h5 --task production -m mpjpe -m pck
sltk evaluate preds.eaf refs.eaf --task segmentation
# ELAN utilities
sltk to-elan segments.json --video video.mp4 -o annotations.eaf
sltk from-elan annotations.eaf -o segments.json --tier Gloss
sltk info video_wilor.h5
See the full CLI documentation.
Tutorials & Notebooks
| # | Tutorial | Notebook | Description |
|---|---|---|---|
| 00 | — | Notebook | Foundations: core data types, H5 schemas, the extractor contract (start here) |
| 01 | Pose Extraction | Notebook | WiLoR hands, NLF body, TEASER face, MediaPipe |
| 02 | Segmentation & Spotting | Notebook | Sign boundaries, dictionary matching |
| 03 | NMS & ELAN | Notebook | Non-manual signals, ELAN file assembly |
| 04 | Evaluation Metrics | Notebook | Translation, production, segmentation metrics |
| 05 | ELAN Files | — | Reading, writing, merging annotation files |
| 06 | Datasets | — | Loading and exploring sign language corpora |
| 07 | Feature Processing | Notebook | Pose features and normalization |
| 10 | Extending SLTK | Notebook | Add a segmenter, spotter, extractor, metric, format, detector, … |
Model Weights
All weights auto-download from HuggingFace Hub and cache at ~/.cache/sltk/weights/.
| Model | Size | Env Override |
|---|---|---|
| WiLoR (hands) | ~2.5 GB | SLTK_WILOR_CHECKPOINT |
| NLF (body) | ~540 MB | SLTK_NLF_MODEL |
| TEASER (face) | ~350 MB | SLTK_TEASER_CHECKPOINT |
| SignRep (embedding) | ~350 MB | SLTK_SIGNREP_CHECKPOINT |
| Segmenter | ~180 MB | SLTK_SEGMENTOR_V2_CHECKPOINT |
Set SLTK_AUTO_DOWNLOAD=1 to skip the confirmation prompt. Set SLTK_WEIGHTS_DIR to override the cache location.
Testing
pytest # Full suite (~947 tests)
pytest -m "not slow" # Skip slow GPU/neural tests
pytest tests/test_metrics.py # Single module
Documentation
Full documentation at sign-language-toolkit.readthedocs.io
| Section | Content |
|---|---|
| Installation | Install options, GPU setup, weight management |
| Quick Start | First steps with SLTK |
| API Reference | Full Python API docs |
| Cropping | ROI and lip cropping |
| NMS Detection | Non-manual signal detection API |
| Visualization | Skeleton and mesh overlays |
| CLI Reference | Command-line interface |
License
CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International)
Third-Party Licenses
SLTK bundles or depends on models with their own license terms:
| Component | License | Link |
|---|---|---|
| MANO (hand model) | Non-commercial | mano.is.tue.mpg.de |
| SMPL-X (body model) | Non-commercial | smpl-x.is.tue.mpg.de |
| FLAME (face model) | Non-commercial / CC-BY-4.0 (2023+) | flame.is.tue.mpg.de |
| WiLoR | Apache 2.0 | Potamias et al., CVPR 2025 |
| TEASER | See repository | Liu et al., ICLR 2025 |
| NLF | MIT (non-commercial) | Sárándi & Pons-Moll, NeurIPS 2024 |
Citations & Acknowledgements
SLTK integrates several third-party models and methods. If you use these components in your research, please cite the original papers. SLTK does not claim authorship of these models — it provides a unified interface to run them together.
Sign Segmentation — He et al., FG 2025
The sign segmenter uses the Hands-On model for temporal sign boundary detection from hand pose features.
@inproceedings{he2025hands,
title={Hands-On: Segmenting Individual Signs from Continuous Sequences},
author={He, Low Jian and Walsh, Harry and Sincan, Ozge Mercanoglu and Bowden, Richard},
booktitle={2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)},
pages={1--5},
year={2025},
organization={IEEE}
}
Gloss Spotting — Wong et al., ICCV 2025
SignRep provides self-supervised sign language representations used for dictionary-based gloss matching.
@inproceedings{wong2025signrep,
title={SignRep: Enhancing Self-Supervised Sign Representations},
author={Wong, Ryan and Camgoz, Necati Cihan and Bowden, Richard},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
pages={22804--22814},
year={2025}
}
Face Reconstruction — Liu et al., ICLR 2025
TEASER provides FLAME-based face parameter extraction (jaw pose, expression, eyelid) used for non-manual signal detection.
@article{liu2025teaser,
title={TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction},
author={Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
booktitle={International Conference on Learning Representations (ICLR)},
year={2025}
}
Hand Reconstruction — Potamias et al., CVPR 2025
WiLoR provides end-to-end 3D hand localization and MANO parameter estimation from in-the-wild images.
@inproceedings{potamias2025wilor,
title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
author={Potamias, Rolandos Alexandros and Zhang, Jinglei and Deng, Jiankang and Zafeiriou, Stefanos},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2025}
}
Body Pose — Sárándi & Pons-Moll, NeurIPS 2024
NLF estimates continuous 3D human body pose and shape (SMPL-X parameters, including eye gaze).
@inproceedings{sarandi2024nlf,
title={Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation},
author={S{\'a}r{\'a}ndi, Istv{\'a}n and Pons-Moll, Gerard},
booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
year={2024}
}
Pose Estimation — Jiang et al., 2023
RTMPose provides real-time multi-person whole-body keypoint detection (133 COCO-WholeBody landmarks).
@article{jiang2023rtmpose,
title={RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose},
author={Jiang, Tao and Lu, Peng and Zhang, Li and Ma, Ningsheng and Han, Rui and Lyu, Chengqi and Li, Yining and Chen, Kai},
journal={arXiv preprint arXiv:2303.07399},
year={2023}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signlangtk-0.2.0.tar.gz.
File metadata
- Download URL: signlangtk-0.2.0.tar.gz
- Upload date:
- Size: 652.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
217f791f6bda6063c364f7334032c52ff0a2ecb04b8fc0927c82453968563e6f
|
|
| MD5 |
111545af5e22d9877f62564a74fda821
|
|
| BLAKE2b-256 |
239d772449ab7dd4fc1e70f5b1c11afdb94fd41d8d8d07bda40538331bd51618
|
Provenance
The following attestation bundles were made for signlangtk-0.2.0.tar.gz:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.2.0.tar.gz -
Subject digest:
217f791f6bda6063c364f7334032c52ff0a2ecb04b8fc0927c82453968563e6f - Sigstore transparency entry: 1688091751
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@2f2b26bc86b0ef03337ffb61b214dfebe3c5ca05 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2f2b26bc86b0ef03337ffb61b214dfebe3c5ca05 -
Trigger Event:
push
-
Statement type:
File details
Details for the file signlangtk-0.2.0-py3-none-any.whl.
File metadata
- Download URL: signlangtk-0.2.0-py3-none-any.whl
- Upload date:
- Size: 636.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1dc2d25676045764c16c7311b9b5b37dafca58635c88ba242cfba4e67be9f9bb
|
|
| MD5 |
a6ab2ec69a0c22a4e9cd2809e8ea87d5
|
|
| BLAKE2b-256 |
723f1d5f1c429e0a40e2f081dfc6824713acbef40265ad924c6de78997d22fe5
|
Provenance
The following attestation bundles were made for signlangtk-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.2.0-py3-none-any.whl -
Subject digest:
1dc2d25676045764c16c7311b9b5b37dafca58635c88ba242cfba4e67be9f9bb - Sigstore transparency entry: 1688091785
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@2f2b26bc86b0ef03337ffb61b214dfebe3c5ca05 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2f2b26bc86b0ef03337ffb61b214dfebe3c5ca05 -
Trigger Event:
push
-
Statement type: