Skip to main content

Sign Language Toolkit for sign language research

Project description

Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to rich, multi-tier ELAN annotation files: 3D hand reconstruction, full-body pose, face tracking, automatic sign segmentation, gloss spotting, and non-manual signal detection.

Video (.mp4)
    │
    ├─ WiLoR ──────► 3D hand keypoints + MANO rotations   (_wilor.h5)
    ├─ NLF ────────► Full-body SMPL-X pose                 (_nlf.h5)
    ├─ TEASER ─────► FLAME face parameters                 (_teaser.h5)
    │
    ├─ Segmenter ──► Sign boundaries (BIO labels)
    ├─ SignRep ────► Gloss spotting (dictionary matching)
    ├─ NMS ────────► Blinks, nods, shakes, mouth, gaze
    │
    └─ All results ► Multi-tier ELAN file (.eaf)

Installation

# Core library (ELAN I/O, CLI, corpus tools)
pip install signlangtk

# With web interface
pip install "signlangtk[api]"

# With hand extraction (WiLoR) — requires CUDA
pip install "signlangtk[wilor]"

# Full body (NLF) + face (TEASER)
pip install "signlangtk[nlf,teaser]"

# Everything (excludes rtmpose/smplfx which need mmcv via mim)
pip install "signlangtk[all]"

The PyPI package is signlangtk, the Python import is sltk:

import sltk
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.nlf import NLFExtractor
from sltk.extraction.teaser import TeaserExtractor

Requires Python 3.10+. For development from source:

git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"

Quick Start: Video to Multi-Tier ELAN

This end-to-end example takes a single video and produces an ELAN file with sign boundaries, spotted glosses, and non-manual signals on separate tiers.

from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms, export_results
from sltk.io.elan_roundtrip import ElanDocument

VIDEO = "recording.mp4"
FPS = 25.0

# ── Step 1: Extract poses ───────────────────────────────────────────
# Hands (WiLoR → MANO 3D)
with WiLoRExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_wilor.h5")

# Face (TEASER → FLAME)
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 2: Segment signs ──────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")   # (T, 192)
runner = get_runner()
labels = runner.predict(features)                  # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 3: Detect non-manual signals ──────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5",
    detectors={"all"},
)

# ── Step 4: Build multi-tier ELAN file ─────────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

# Add segmentation tier
doc.add_tier("Segmentation")
for start_frame, end_frame in segments:
    doc.add_segment("Segmentation", start_frame / FPS, end_frame / FPS, "SIGN")

# Add NMS tiers (blinks, head movements, mouth, eyebrows, gaze)
for tier_name in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT",
                  "MOUTH-MOVEMENT", "EYEBROW-RAISE", "EYE-GAZE"]:
    doc.add_tier(tier_name)

for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")

for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording_full.eaf")

Open recording_full.eaf in ELAN to see all tiers aligned with the video.


Extraction: Hands, Body, and Face

All extractors share the same interface: load_model(), extract_from_video(), process_batch(). Weights are auto-downloaded from HuggingFace Hub on first use.

WiLoR — 3D Hand Reconstruction

Produces 21 keypoints per hand + MANO rotation matrices per frame.

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,
    rescale_factor=2.0,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")

Citation required. If you use WiLoR hand extraction you must cite:

@inproceedings{potamias2024wilor,
    title     = {{WiLoR}: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
    author    = {Potamias, Rolandos Alexandros and Ploumpis, Stylianos and Moschoglou, Stylianos and Triantafyllou, Vasileios and Zafeiriou, Stefanos},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year      = {2024}
}

Output H5 structure:

video_wilor.h5
├── attrs: fps, num_frames, resolution
├── frame_idx      (num_frames, 2)           # sparse: (start_idx, count)
├── kpts_3d        (num_detections, 21, 3)   # 3D hand keypoints
├── right          (num_detections,)          # True = right hand
└── mano/
    ├── hand_pose      (num_detections, 15, 3, 3)  # joint rotations
    └── global_orient  (num_detections, 1, 3, 3)    # wrist rotation

NLF — Full-Body SMPL-X

Produces 55 SMPL-X joints (body + hands + face landmarks) per frame.

from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")

TEASER — FLAME Face Parameters

Produces FLAME 3D face parameters: jaw pose, expression coefficients, shape, eyelid state, and head pose per frame. This is what drives NMS detection.

from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")

Citation required. If you use TEASER face extraction you must cite:

@article{liu2025teaser,
    title   = {Teaser: Token Enhanced Spatial Modeling for Expressions Reconstruction},
    author  = {Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
    journal = {arXiv preprint arXiv:2502.10982},
    year    = {2025}
}

Batch Processing

All extractors support batch processing over a directory:

from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    results = ext.process_batch(
        video_paths=list(Path("videos/").glob("*.mp4")),
        output_dir=Path("poses/"),
        skip_existing=True,
    )
    for path, result in results.items():
        print(f"{path}: {result.num_frames} frames, {result.num_detections} detections")

Model Weights

All weights are auto-downloaded from HuggingFace Hub and cached at ~/.cache/sltk/weights/. Override with environment variables:

Variable Model
SLTK_WILOR_CHECKPOINT WiLoR hand model
SLTK_NLF_MODEL NLF body model
SLTK_TEASER_CHECKPOINT TEASER face model
SLTK_SIGNREP_CHECKPOINT SignRep embedding model
SLTK_SEGMENTOR_V2_CHECKPOINT Segmenter model

Segmentation: Finding Sign Boundaries

The segmenter is a 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN).

Citation required. If you use the sign segmentation model you must cite:

@inproceedings{he2025improving,
    title     = {Improving Continuous Sign Language Recognition with Adapted Image Models},
    author    = {He, Lianyu and Tian, Haocong and Fan, Shujing and Woll, Bencie and Bowden, Richard},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025}
}

From WiLoR H5

from sltk.segmentation.runner import get_runner, segment_h5
from sltk.segmentation.output import OutputFormat
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# High-level: segment a file → ELAN or JSON
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",
)

# Low-level: get raw predictions
features = h5_to_features("video_wilor.h5")  # (T, 192)
runner = get_runner()
labels = runner.predict(features)             # (T,) values 0/1/2
segments = extract_segments(labels)           # [(start_frame, end_frame), ...]

Gloss Spotting: Matching Signs to a Dictionary

SignRep extracts 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.

Citation required. If you use SignRep gloss spotting you must cite:

@inproceedings{wong2025signrep,
    title     = {SignRep: Enhancing Self-supervised Sign Representations},
    author    = {Wong, Mathew and Fish, Ed and Sherrah, Jamie and Sherwood, Thomas and Sherwood, Nathan},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025}
}
from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)

# Load your sign dictionary (folder of .npz files, one per sign)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

# Match segments to dictionary entries
result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=10,
)

for seg in result.segments:
    print(f"Segment {seg.start_ms}ms-{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")

# Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")

Building a Dictionary

Before spotting, build a dictionary from isolated sign videos:

result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")

Non-Manual Signals: Face and Head Analysis

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face-tracking data.

from sltk.nms.runner import detect_nms, export_results

# Detect all NMS events from a TEASER H5 file
blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},           # or specific: {"blink", "nod", "mouth"}
    smpl_path="video_nlf.h5",   # optional: enables gaze detection
)

# Export to ELAN (one tier per signal type)
export_results(
    blinks, nms_events, "video_teaser.h5",
    output_dir="output/",
    formats=["elan", "json", "csv"],
    media_path="video.mp4",
)

Available Detectors

Detector ELAN Tier Signal Input
blink BLINK Eye closures TEASER eyelid params
nod HEAD-NOD Vertical head oscillation TEASER head pitch
shake HEAD-SHAKE Horizontal head oscillation TEASER head yaw
tilt HEAD-TILT Side-to-side head tilt TEASER head roll
mouth MOUTH-MOVEMENT Lip/mouth movement FLAME expression
eyebrow EYEBROW-RAISE Eyebrow raise/furrow FLAME expression
gaze EYE-GAZE Gaze direction NLF eye pose
squint EYE-SQUINT Partial eye closure TEASER eyelid

Feature Representations

SLTK computes several feature representations from the extracted H5 pose data.

WiLoR Segmenter Features (192-dim)

MANO rotation matrices converted to axis-angle, both hands concatenated. Used by the Transformer segmenter.

from sltk.segmentation.h5_loader import h5_to_features
features = h5_to_features("video_wilor.h5")  # (T, 192)

Angle Features (104-dim)

Body joint angles + hand Euler angles from MANO rotations.

from sltk.processing.features import compute_angle_features
angles = compute_angle_features(body_poses, right_hand, left_hand)  # (T, 104)

HaMeR Features (288-dim)

Flattened MANO rotation matrices for both hands.

from sltk.processing.features import compute_hamer_features
hamer = compute_hamer_features(
    mano_global_orient_right, mano_hand_pose_right,
    mano_global_orient_left, mano_hand_pose_left,
)  # (T, 288)

SignRep Embeddings (768-dim)

Dense visual features from the SignRep ViT model. 16-frame windows, L2-normalized.

from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768)

Combined Features from H5

from sltk.processing.features import load_features_from_nlf_wilor
angles, hamer = load_features_from_nlf_wilor("video_nlf.h5", "video_wilor.h5")

ELAN File I/O

Writing a New ELAN File

from sltk.io.elan_roundtrip import ElanDocument

doc = ElanDocument.new(video_path="video.mp4")
doc.add_tier("Gloss")
doc.add_tier("NMS")
doc.add_segment("Gloss", 0.0, 1.5, "HELLO")
doc.add_segment("Gloss", 1.5, 3.0, "WORLD")
doc.add_segment("NMS", 0.2, 0.8, "nod")
doc.save("output.eaf")

Reading and Modifying Existing ELAN Files

doc = ElanDocument.open("annotations.eaf")
tiers = doc.get_tiers()         # list of TierInfo
segments = doc.get_segments()   # list of SegmentInfo (all tiers)

# Add new annotations from pipeline results
doc.add_tier("AutoSegmentation")
doc.add_segment("AutoSegmentation", 1.0, 2.5, "SIGN")
doc.save()  # preserves all original XML structure

Simple Read/Write

from sltk.io import read_eaf, write_eaf
from sltk.data import Segment, SegmentList

# Read
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Write
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="video.mp4")

Web Interface

SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.

# Development: FastAPI (port 8000) + Vite (port 5173)
bash scripts/run_dev.sh

# Production
cd frontend && npm ci && npm run build && cd ..
sltk serve --host 0.0.0.0 --port 8000
Route Page Purpose
/ Workspaces Create/switch workspaces, scan directories
/process Process Submit segmentation and spotting jobs
/explore Explore Search glosses, view video clips, corpus statistics
/viewer Viewer Video playback with annotation overlay
/analysis/* Analysis Vocabulary, concordance, n-grams, collocations, durations

CLI Reference

sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload

Configuration

Variable Description Default
SLTK_CORS_ORIGINS Allowed CORS origins http://localhost:5173,http://localhost:3000
SLTK_ALLOWED_PATHS Filesystem whitelist for API /vol/research,/home
SLTK_WEIGHTS_DIR Override weight cache location ~/.cache/sltk/weights/

Supported Pose Formats

Format Extractor Keypoints Description
WiLoR WiLoRExtractor 21 per hand MANO 3D hand mesh with rotation matrices
NLF/SMPL-X NLFExtractor 55 joints Full body with axis-angle rotations
TEASER/FLAME TeaserExtractor FLAME params Face parameters: jaw, expression, shape, eyelid
MediaPipe MediaPipeExtractor 33+42+468 Fast 2D/3D holistic landmarks
RTMPose RTMPoseExtractor 133 COCO-WholeBody, multi-person

All stored as HDF5 (.h5) files.

Testing

pytest                       # Full suite (1500+ tests)
pytest -m "not slow"         # Skip slow tests
pytest -m api                # API tests only

License

This project is licensed under CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International).

Third-Party Licenses

SLTK bundles or depends on models that have their own license terms. By using SLTK you agree to comply with all applicable licenses:

Component License Link
MANO (hand model) MANO License (non-commercial) https://mano.is.tue.mpg.de/license.html
SMPL / SMPL-X (body model) SMPL License (non-commercial) https://smpl.is.tue.mpg.de/license.html
FLAME (face model) FLAME License (non-commercial) https://flame.is.tue.mpg.de/license.html
WiLoR Apache 2.0 [Potamias et al., 2024]
TEASER See paper [Liu et al., 2025]
NLF See repository [Sarandi et al.]

Please refer to each project's license before using their models in your work.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.7.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signlangtk-0.1.7-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file signlangtk-0.1.7.tar.gz.

File metadata

  • Download URL: signlangtk-0.1.7.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.7.tar.gz
Algorithm Hash digest
SHA256 c6058b7a9b28cf1da320a800da7c519d5d37d7471f77e081871ba87a43dd8aa6
MD5 4264b0cd4118a40a8c574bbc769b51cd
BLAKE2b-256 f6b38a52fe310dc1ebc1b53ef636aca672e736c834f088242bc2c6b4989e23e5

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.7.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signlangtk-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: signlangtk-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 b201ce2a3dc53deea0823b4e60a58e55bc6bfc6d5c45d1455d804c9970ab3ddc
MD5 7c5ee7d84c2fd57263331352f611a8c7
BLAKE2b-256 4f97116574be83e2b07170dfff624d7da889893ac95dbbc4ab4afb17bdf7036c

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.7-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page