Skip to main content

Sign Language Toolkit for sign language research

Project description

PyPI Python License Tests Docs

Sign Language Toolkit (SLTK)

| Tutorials | Documentation | Notebooks | Contributing | PyPI |

What SLTK Offers

  • SLTK is an open-source Python toolkit that accelerates sign language research and makes SOTA computer vision tools and resources easily availble to linguists and other people working in Sign Language and AI.

  • It provides a complete pipeline for pose extraction, sign segmentation, gloss spotting, non-manual signal detection, corpus analysis, and evaluation — all with a single CLI or Python API.

  • SLTK brings together state-of-the-art models (WiLoR, NLF, TEASER, SignRep) behind a unified interface, so researchers can focus on linguistics rather than engineering.

Vision

  • We believe the field needs a holistic toolkit that jointly supports the full research workflow: video processing, pose extraction, temporal analysis, corpus management, and standardized evaluation.

  • SLTK is designed for reproducibility: such that every metric, extraction step, and analysis tool uses the same data structures and can be run from a single CLI command or Python script.

Quick Start

Installation

# Core library (ELAN I/O, CLI, corpus tools, analysis) — no GPU needed
pip install signlangtk

# With GPU extraction backends
pip install "signlangtk[wilor,nlf,teaser]"

# Everything
pip install "signlangtk[all]"

Package name: signlangtk on PyPI, sltk for Python imports.

Development install
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api,wilor,nlf,teaser]"
pytest  # 1697 tests

One-Command Pipeline

An example video (examples/example.mp4) is included in the repository for testing.

sltk pipeline video.mp4 -o output/
# → output/video.eaf  (10+ tiers: segmentation, blinks, head nods, mouth, gaze, ...)

Python API

from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms
from sltk.io.elan_roundtrip import ElanDocument

VIDEO, FPS = "recording.mp4", 25.0

# ── Step 1: Extract 3D hand poses ─────────────────────────────────
with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video(VIDEO, "recording_wilor.h5")
    print(f"{result.num_detections} hand detections across {result.num_frames} frames")

# ── Step 2: Extract face parameters ───────────────────────────────
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 3: Segment signs ─────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")  # (T, 192) MANO features
labels = get_runner().predict(features)            # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 4: Detect non-manual signals ─────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5", detectors={"all"}
)

# ── Step 5: Assemble multi-tier ELAN file ─────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

doc.add_tier("Segmentation")
for s, e in segments:
    doc.add_segment("Segmentation", s / FPS, e / FPS, "SIGN")

for tier in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT", "MOUTH-MOVEMENT"]:
    doc.add_tier(tier)
for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")
for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording.eaf")

Extraction

Three GPU extractors share the same interface: load_model()extract_from_video(). Weights auto-download from HuggingFace Hub on first use (~3.4 GB total).

WiLoR — 3D Hand Reconstruction

21 keypoints per hand with MANO rotation matrices. Primary input for sign segmentation.

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,       # reduce for <8GB VRAM
    detection_confidence=0.3,
    use_amp=True,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")

Output H5:

video_wilor.h5
├── attrs: fps, num_frames, resolution
├── img_idx        (M,)              # frame index per detection
├── kpts_3d        (M, 21, 3)       # 3D hand keypoints
├── kpts_2d        (M, 21, 2)       # 2D projections
├── right          (M,)             # True = right hand
├── confidence     (M,)             # detection score
├── bboxes         (M, 4)           # hand bounding boxes
└── mano/
    ├── hand_pose      (M, 15, 3, 3)   # joint rotations
    ├── global_orient  (M, 1, 3, 3)    # wrist rotation
    └── betas          (M, 10)          # shape parameters

NLF — Full-Body SMPL-X

55 SMPL-X joints (body + hands + face) with full pose parameters.

from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")

TEASER — FLAME Face Parameters

FLAME 3D face parameters: jaw pose, expression, eyelid, shape, and head pose. Uses MediaPipe for face detection. This is the input for NMS detection.

from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")

Batch Processing

from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "hands.h5")
    print(f"{result.num_detections} hands across {result.num_frames} frames")

Sign Segmentation

A 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels.

# CLI
sltk segment video_wilor.h5 -o segments.eaf -f elan --video video.mp4

# Or directly from video (auto-extracts WiLoR first)
sltk segment video.mp4 -o segments.eaf -f elan
# Python
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

segment_h5("video_wilor.h5", output_path="segments.eaf",
           output_format=OutputFormat.ELAN, fps=25.0, media_path="video.mp4")

Gloss Spotting

Match detected segments to a dictionary of known signs using SignRep embeddings (768-dim ViT features).

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=5,
)

for seg in result.segments:
    for gl in seg.top_glosses[:3]:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")

Non-Manual Signal Detection

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face data.

# CLI
sltk nms video_teaser.h5 -o nms_output/
from sltk.nms.runner import detect_nms

blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},
    smpl_path="video_nlf.h5",  # optional: enables gaze detection
)
print(f"{len(blinks)} blinks, {len(nms_events)} NMS events")
print(f"Tracking quality: {quality.detection_rate:.0%}")
Detector ELAN Tier Signal Source Data
Blink BLINK Eye closures TEASER eyelid
Nod HEAD-NOD Vertical head oscillation TEASER head pitch
Shake HEAD-SHAKE Horizontal head oscillation TEASER head yaw
Tilt HEAD-TILT Side-to-side tilt TEASER head roll
Mouth MOUTH-MOVEMENT Lip and mouth movement FLAME expression
Eyebrow EYEBROW-RAISE Eyebrow raise or furrow FLAME expression
Gaze EYE-GAZE Gaze direction NLF eye pose
Squint EYE-SQUINT Partial eye closure TEASER eyelid

Evaluation Metrics

Standardized metrics for translation, production (pose & video), and segmentation evaluation. See the Metrics tutorial and notebook.

Task Metrics Dependencies
Translation BLEU-1/2/3/4, ROUGE-L, chrF/chrF++, TER, METEOR, WER signlangtk[metrics]
Translation (neural) BLEURT, BERTScore signlangtk[metrics-neural]
Production — Pose MPJPE, PA-MPJPE, PCK, DTW-MJE, APE, FGD core (numpy/scipy)
Production — Video SSIM, PSNR, FID signlangtk[metrics-video]
Segmentation Boundary F1, IoU, frame accuracy, label P/R/F1, confusion matrix core

ROI Cropping

GPU-accelerated region-of-interest cropping for lips, hands, and other body regions. See the Cropping API.

from sltk.cropping import crop_lips_from_video

crops, frame_indices, bboxes = crop_lips_from_video("video.mp4", output_size=96)

Corpus & Linguistic Analysis

Tool Description Docs
Corpus Database SQLite per workspace, auto-ingests ELAN files, FTS5 search API
Vocabulary Frequency distributions, type/token ratios Analysis API
Concordance (KWIC) Keyword-in-context with configurable window Concordance
N-grams Bigram/trigram extraction with frequency counts Analysis API
Collocations Co-occurrence analysis with PMI/log-likelihood Analysis API
Duration Analysis Sign duration histograms and statistics Analysis API
Cross-Workspace Compare vocabulary and patterns across corpora Analysis API

Linguistics

Specialized linguistic analysis tools. See the Linguistics API.

Tool Description
Phonology HLMO parameter model — handshape, location, movement, orientation inventories. Minimal pair detection, phonological distance, inventory analysis.
Non-Manual Analysis NMS scope, timing, co-occurrence analysis. Grammatical pattern detection (wh-questions, negation, topics).
Inter-Rater Reliability Cohen's Kappa, Fleiss' Kappa, Krippendorff's Alpha. Boundary agreement and temporal label agreement for corpus annotation.

Glossing

Vocabulary management for model training. See the Glossing API.

from sltk.glossing import Vocabulary

vocab = Vocabulary.from_samples(dataset, min_count=2)
ids = vocab.encode(["HELLO", "WORLD"], add_bos=True, add_eos=True)

Visualization

Skeleton and 3D mesh overlay rendering on video frames. See the Visualization API.

from sltk.visualization import generate_overlay_video
generate_overlay_video("video.mp4", "video_wilor.h5", "overlay.mp4", viz_type="wilor")

ELAN I/O

Feature Description
Read/Write Full ELAN (.eaf) round-trip preserving all XML structure
Create Build multi-tier ELAN files programmatically
Merge Combine tiers from multiple ELAN files
Export Convert to/from JSON, CSV, and other formats

See the ELAN tutorial.

Supported Datasets

Built-in loaders for major sign language datasets:

Dataset Language Type Size Tutorial
WLASL ASL Isolated 2,000 classes Datasets
ASL-Citizen ASL Isolated Community-sourced Datasets
How2Sign ASL Continuous 35K sentences Datasets
BSLCP BSL Continuous Multi-view corpus Datasets
BOBSL BSL Continuous BBC archive Datasets
Phoenix-2014T DGS Continuous Weather broadcasts Datasets
CSL-Daily CSL Continuous Daily conversations Datasets

Additional Features

  • Pipeline CLI: Single command from raw video to multi-tier ELAN file (sltk pipeline)
  • Unified Pose Format: PoseSequence class normalizes all backends to (T, N, C) arrays with format conversion
  • ROI Cropping: GPU-accelerated lip, hand, and generic region cropping for data loading and preprocessing
  • Phonological Analysis: HLMO parameter model with minimal pair detection and inventory analysis
  • Inter-Rater Reliability: Cohen's/Fleiss' Kappa, Krippendorff's Alpha, boundary agreement
  • Gloss Vocabulary: Encode/decode glosses for model training with special token support
  • Visualization: Skeleton overlay and 3D mesh projection rendering
  • Web Interface: React + FastAPI app for workspace management, corpus exploration, and video viewing
  • Corpus Database: SQLite per workspace with auto-ingest of ELAN files and full-text search
  • Batch Processing: All extractors support directory-level batch processing
  • Model Weight Management: Auto-download from HuggingFace Hub with environment variable overrides
  • Modular Install: Optional dependency groups ([wilor], [metrics], [api], etc.) keep the base lightweight

CLI Reference

# Full pipeline
sltk pipeline video.mp4 -o output/

# Pose extraction
sltk extract wilor video.mp4 -o hands.h5
sltk extract nlf video.mp4 -o body.h5
sltk extract teaser video.mp4 -o face.h5

# Sign segmentation
sltk segment hands.h5 -o segments.eaf -f elan --video video.mp4

# Non-manual signal detection
sltk nms face.h5 -o nms_output/

# Gloss spotting
sltk spot video.mp4 --segments segments.json --dictionary dict/

# Evaluation
sltk evaluate preds.txt refs.txt --task translation -m bleu4 -m chrf
sltk evaluate pred.h5 ref.h5 --task production -m mpjpe -m pck
sltk evaluate preds.eaf refs.eaf --task segmentation

# ELAN utilities
sltk to-elan segments.json --video video.mp4 -o annotations.eaf
sltk from-elan annotations.eaf -o segments.json --tier Gloss
sltk info video_wilor.h5

# Web interface
sltk serve --host 0.0.0.0 --port 8000

See the full CLI documentation.

Tutorials & Notebooks

# Tutorial Notebook Description
01 Pose Extraction Notebook WiLoR hands, NLF body, TEASER face
02 Segmentation & Spotting Notebook Sign boundaries, dictionary matching
03 NMS & ELAN Notebook Non-manual signals, ELAN file assembly
04 Evaluation Metrics Notebook Translation, production, segmentation metrics
05 ELAN Files Reading, writing, merging annotation files
06 Datasets Loading and exploring sign language corpora
07 Concordance KWIC, n-grams, collocations
08 Feature Processing Pose features and normalization

Model Weights

All weights auto-download from HuggingFace Hub and cache at ~/.cache/sltk/weights/.

Model Size Env Override
WiLoR (hands) ~2.5 GB SLTK_WILOR_CHECKPOINT
NLF (body) ~540 MB SLTK_NLF_MODEL
TEASER (face) ~350 MB SLTK_TEASER_CHECKPOINT
SignRep (embedding) ~350 MB SLTK_SIGNREP_CHECKPOINT
Segmenter ~180 MB SLTK_SEGMENTOR_V2_CHECKPOINT

Set SLTK_AUTO_DOWNLOAD=1 to skip the confirmation prompt. Set SLTK_WEIGHTS_DIR to override the cache location.

Testing

pytest                       # Full suite (1697 tests)
pytest -m "not slow"         # Skip slow GPU/neural tests
pytest tests/test_metrics.py # Single module

Documentation

Full documentation at sign-language-toolkit.readthedocs.io

Section Content
Installation Install options, GPU setup, weight management
Quick Start First steps with SLTK
API Reference Full Python API docs
Cropping ROI and lip cropping
Corpus Database SQLite corpus database
Linguistics Phonology, NMS analysis, reliability
Glossing Vocabulary management
NMS Detection Non-manual signal detection API
Visualization Skeleton and mesh overlays
CLI Reference Command-line interface
REST API FastAPI endpoint reference

License

CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International)

Third-Party Licenses

SLTK bundles or depends on models with their own license terms:

Component License Link
MANO (hand model) Non-commercial mano.is.tue.mpg.de
SMPL-X (body model) Non-commercial smpl-x.is.tue.mpg.de
FLAME (face model) Non-commercial / CC-BY-4.0 (2023+) flame.is.tue.mpg.de
WiLoR Apache 2.0 Potamias et al., CVPR 2025
TEASER See repository Liu et al., ICLR 2025
NLF MIT (non-commercial) Sárándi & Pons-Moll, NeurIPS 2024

Citations & Acknowledgements

SLTK integrates several third-party models and methods. If you use these components in your research, please cite the original papers. SLTK does not claim authorship of these models — it provides a unified interface to run them together.

Sign Segmentation — He et al., FG 2025

The sign segmenter uses the Hands-On model for temporal sign boundary detection from hand pose features.

@inproceedings{he2025hands,
  title={Hands-On: Segmenting Individual Signs from Continuous Sequences},
  author={He, Low Jian and Walsh, Harry and Sincan, Ozge Mercanoglu and Bowden, Richard},
  booktitle={2025 IEEE 19th International Conference on Automatic Face and Gesture Recognition (FG)},
  pages={1--5},
  year={2025},
  organization={IEEE}
}

Gloss Spotting — Wong et al., ICCV 2025

SignRep provides self-supervised sign language representations used for dictionary-based gloss matching.

@inproceedings{wong2025signrep,
  title={SignRep: Enhancing Self-Supervised Sign Representations},
  author={Wong, Ryan and Camgoz, Necati Cihan and Bowden, Richard},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  pages={22804--22814},
  year={2025}
}

Face Reconstruction — Liu et al., ICLR 2025

TEASER provides FLAME-based face parameter extraction (jaw pose, expression, eyelid) used for non-manual signal detection.

@article{liu2025teaser,
  title={TEASER: Token Enhanced Spatial Modeling for Expressions Reconstruction},
  author={Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
  booktitle={International Conference on Learning Representations (ICLR)},
  year={2025}
}

Hand Reconstruction — Potamias et al., CVPR 2025

WiLoR provides end-to-end 3D hand localization and MANO parameter estimation from in-the-wild images.

@inproceedings{potamias2025wilor,
  title={WiLoR: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
  author={Potamias, Rolandos Alexandros and Zhang, Jinglei and Deng, Jiankang and Zafeiriou, Stefanos},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025}
}

Body Pose — Sárándi & Pons-Moll, NeurIPS 2024

NLF estimates continuous 3D human body pose and shape (SMPL-X parameters, including eye gaze).

@inproceedings{sarandi2024nlf,
  title={Neural Localizer Fields for Continuous 3D Human Pose and Shape Estimation},
  author={S{\'a}r{\'a}ndi, Istv{\'a}n and Pons-Moll, Gerard},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2024}
}

Pose Estimation — Jiang et al., 2023

RTMPose provides real-time multi-person whole-body keypoint detection (133 COCO-WholeBody landmarks).

@article{jiang2023rtmpose,
  title={RTMPose: Real-Time Multi-Person Pose Estimation based on MMPose},
  author={Jiang, Tao and Lu, Peng and Zhang, Li and Ma, Ningsheng and Han, Rui and Lyu, Chengqi and Li, Yining and Chen, Kai},
  journal={arXiv preprint arXiv:2303.07399},
  year={2023}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.12.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signlangtk-0.1.12-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file signlangtk-0.1.12.tar.gz.

File metadata

  • Download URL: signlangtk-0.1.12.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.12.tar.gz
Algorithm Hash digest
SHA256 3df5b9f0079c8156373e011f476d35dcba6a28e08034211ac9357713587cbd88
MD5 4d9ee6710513b7cba2f41322322e5558
BLAKE2b-256 fe69c753e979cec960eddf90a54c56ec958f7eb79b0ba417c12b989f54a5359c

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.12.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signlangtk-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: signlangtk-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 f7ce7b1cfb69d1db3a4e8affc2b2700aba66630577df690b65cc95bbf62448a1
MD5 d18876094c877933af119d5c1a208c36
BLAKE2b-256 d4cf8fdd71d187a9b1b0876d9481ab7d201c7a24bcc209ad52b0c5469a287886

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.12-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page