Sign Language Toolkit for sign language research

These details have not been verified by PyPI

Project description

Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to rich, multi-tier ELAN annotation files: 3D hand reconstruction, full-body pose, face tracking, automatic sign segmentation, gloss spotting, and non-manual signal detection.

Video (.mp4)
    │
    ├─ WiLoR ──────► 3D hand keypoints + MANO rotations   (_wilor.h5)
    ├─ NLF ────────► Full-body SMPL-X pose                 (_nlf.h5)
    ├─ TEASER ─────► FLAME face parameters                 (_teaser.h5)
    │
    ├─ Segmenter ──► Sign boundaries (BIO labels)
    ├─ SignRep ────► Gloss spotting (dictionary matching)
    ├─ NMS ────────► Blinks, nods, shakes, mouth, gaze
    │
    └─ All results ► Multi-tier ELAN file (.eaf)

Installation

# Core library (ELAN I/O, CLI, corpus tools)
pip install signlangtk

# With web interface
pip install "signlangtk[api]"

# With hand extraction (WiLoR) — requires CUDA
pip install "signlangtk[wilor]"

# Full body (NLF) + face (TEASER)
pip install "signlangtk[nlf,teaser]"

# Everything (excludes rtmpose/smplfx which need mmcv via mim)
pip install "signlangtk[all]"

The PyPI package is signlangtk, the Python import is sltk:

import sltk
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.nlf import NLFExtractor
from sltk.extraction.teaser import TeaserExtractor

Requires Python 3.10+. For development from source:

git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"

Quick Start: Video to Multi-Tier ELAN

This end-to-end example takes a single video and produces an ELAN file with sign boundaries, spotted glosses, and non-manual signals on separate tiers.

from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms, export_results
from sltk.io.elan_roundtrip import ElanDocument

VIDEO = "recording.mp4"
FPS = 25.0

# ── Step 1: Extract poses ───────────────────────────────────────────
# Hands (WiLoR → MANO 3D)
with WiLoRExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_wilor.h5")

# Face (TEASER → FLAME)
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 2: Segment signs ──────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")   # (T, 192)
runner = get_runner()
labels = runner.predict(features)                  # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 3: Detect non-manual signals ──────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5",
    detectors={"all"},
)

# ── Step 4: Build multi-tier ELAN file ─────────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

# Add segmentation tier
doc.add_tier("Segmentation")
for start_frame, end_frame in segments:
    doc.add_segment("Segmentation", start_frame / FPS, end_frame / FPS, "SIGN")

# Add NMS tiers (blinks, head movements, mouth, eyebrows, gaze)
for tier_name in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT",
                  "MOUTH-MOVEMENT", "EYEBROW-RAISE", "EYE-GAZE"]:
    doc.add_tier(tier_name)

for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")

for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording_full.eaf")

Open recording_full.eaf in ELAN to see all tiers aligned with the video.

Extraction: Hands, Body, and Face

All extractors share the same interface: load_model(), extract_from_video(), process_batch(). Weights are auto-downloaded from HuggingFace Hub on first use.

WiLoR — 3D Hand Reconstruction

Produces 21 keypoints per hand + MANO rotation matrices per frame.

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,
    rescale_factor=2.0,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")

Citation required. If you use WiLoR hand extraction you must cite:

@inproceedings{potamias2024wilor,
    title     = {{WiLoR}: End-to-end 3D Hand Localization and Reconstruction in-the-wild},
    author    = {Potamias, Rolandos Alexandros and Ploumpis, Stylianos and Moschoglou, Stylianos and Triantafyllou, Vasileios and Zafeiriou, Stefanos},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year      = {2024}
}

Output H5 structure:

video_wilor.h5
├── attrs: fps, num_frames, resolution
├── frame_idx      (num_frames, 2)           # sparse: (start_idx, count)
├── kpts_3d        (num_detections, 21, 3)   # 3D hand keypoints
├── right          (num_detections,)          # True = right hand
└── mano/
    ├── hand_pose      (num_detections, 15, 3, 3)  # joint rotations
    └── global_orient  (num_detections, 1, 3, 3)    # wrist rotation

NLF — Full-Body SMPL-X

Produces 55 SMPL-X joints (body + hands + face landmarks) per frame.

from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")

TEASER — FLAME Face Parameters

Produces FLAME 3D face parameters: jaw pose, expression coefficients, shape, eyelid state, and head pose per frame. This is what drives NMS detection.

from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")

Citation required. If you use TEASER face extraction you must cite:

@article{liu2025teaser,
    title   = {Teaser: Token Enhanced Spatial Modeling for Expressions Reconstruction},
    author  = {Liu, Yunfei and Zhu, Lei and Lin, Lijian and Zhu, Ye and Zhang, Ailing and Li, Yu},
    journal = {arXiv preprint arXiv:2502.10982},
    year    = {2025}
}

Batch Processing

All extractors support batch processing over a directory:

from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    results = ext.process_batch(
        video_paths=list(Path("videos/").glob("*.mp4")),
        output_dir=Path("poses/"),
        skip_existing=True,
    )
    for path, result in results.items():
        print(f"{path}: {result.num_frames} frames, {result.num_detections} detections")

Model Weights

All weights are auto-downloaded from HuggingFace Hub and cached at ~/.cache/sltk/weights/. Override with environment variables:

Variable	Model
`SLTK_WILOR_CHECKPOINT`	WiLoR hand model
`SLTK_NLF_MODEL`	NLF body model
`SLTK_TEASER_CHECKPOINT`	TEASER face model
`SLTK_SIGNREP_CHECKPOINT`	SignRep embedding model
`SLTK_SEGMENTOR_V2_CHECKPOINT`	Segmenter model

Segmentation: Finding Sign Boundaries

The segmenter is a 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN).

Citation required. If you use the sign segmentation model you must cite:

@inproceedings{he2025improving,
    title     = {Improving Continuous Sign Language Recognition with Adapted Image Models},
    author    = {He, Lianyu and Tian, Haocong and Fan, Shujing and Woll, Bencie and Bowden, Richard},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025}
}

From WiLoR H5

from sltk.segmentation.runner import get_runner, segment_h5
from sltk.segmentation.output import OutputFormat
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# High-level: segment a file → ELAN or JSON
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",
)

# Low-level: get raw predictions
features = h5_to_features("video_wilor.h5")  # (T, 192)
runner = get_runner()
labels = runner.predict(features)             # (T,) values 0/1/2
segments = extract_segments(labels)           # [(start_frame, end_frame), ...]

Gloss Spotting: Matching Signs to a Dictionary

SignRep extracts 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.

Citation required. If you use SignRep gloss spotting you must cite:

@inproceedings{wong2025signrep,
    title     = {SignRep: Enhancing Self-supervised Sign Representations},
    author    = {Wong, Mathew and Fish, Ed and Sherrah, Jamie and Sherwood, Thomas and Sherwood, Nathan},
    booktitle = {IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)},
    year      = {2025}
}

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)

# Load your sign dictionary (folder of .npz files, one per sign)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

# Match segments to dictionary entries
result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=10,
)

for seg in result.segments:
    print(f"Segment {seg.start_ms}ms-{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")

# Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")

Building a Dictionary

Before spotting, build a dictionary from isolated sign videos:

result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")

Non-Manual Signals: Face and Head Analysis

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face-tracking data.

from sltk.nms.runner import detect_nms, export_results

# Detect all NMS events from a TEASER H5 file
blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},           # or specific: {"blink", "nod", "mouth"}
    smpl_path="video_nlf.h5",   # optional: enables gaze detection
)

# Export to ELAN (one tier per signal type)
export_results(
    blinks, nms_events, "video_teaser.h5",
    output_dir="output/",
    formats=["elan", "json", "csv"],
    media_path="video.mp4",
)

Available Detectors

Detector	ELAN Tier	Signal	Input
`blink`	`BLINK`	Eye closures	TEASER eyelid params
`nod`	`HEAD-NOD`	Vertical head oscillation	TEASER head pitch
`shake`	`HEAD-SHAKE`	Horizontal head oscillation	TEASER head yaw
`tilt`	`HEAD-TILT`	Side-to-side head tilt	TEASER head roll
`mouth`	`MOUTH-MOVEMENT`	Lip/mouth movement	FLAME expression
`eyebrow`	`EYEBROW-RAISE`	Eyebrow raise/furrow	FLAME expression
`gaze`	`EYE-GAZE`	Gaze direction	NLF eye pose
`squint`	`EYE-SQUINT`	Partial eye closure	TEASER eyelid

Feature Representations

SLTK computes several feature representations from the extracted H5 pose data.

WiLoR Segmenter Features (192-dim)

MANO rotation matrices converted to axis-angle, both hands concatenated. Used by the Transformer segmenter.

from sltk.segmentation.h5_loader import h5_to_features
features = h5_to_features("video_wilor.h5")  # (T, 192)

Angle Features (104-dim)

Body joint angles + hand Euler angles from MANO rotations.

from sltk.processing.features import compute_angle_features
angles = compute_angle_features(body_poses, right_hand, left_hand)  # (T, 104)

HaMeR Features (288-dim)

Flattened MANO rotation matrices for both hands.

from sltk.processing.features import compute_hamer_features
hamer = compute_hamer_features(
    mano_global_orient_right, mano_hand_pose_right,
    mano_global_orient_left, mano_hand_pose_left,
)  # (T, 288)

SignRep Embeddings (768-dim)

Dense visual features from the SignRep ViT model. 16-frame windows, L2-normalized.

from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768)

Combined Features from H5

from sltk.processing.features import load_features_from_nlf_wilor
angles, hamer = load_features_from_nlf_wilor("video_nlf.h5", "video_wilor.h5")

ELAN File I/O

Writing a New ELAN File

from sltk.io.elan_roundtrip import ElanDocument

doc = ElanDocument.new(video_path="video.mp4")
doc.add_tier("Gloss")
doc.add_tier("NMS")
doc.add_segment("Gloss", 0.0, 1.5, "HELLO")
doc.add_segment("Gloss", 1.5, 3.0, "WORLD")
doc.add_segment("NMS", 0.2, 0.8, "nod")
doc.save("output.eaf")

Reading and Modifying Existing ELAN Files

doc = ElanDocument.open("annotations.eaf")
tiers = doc.get_tiers()         # list of TierInfo
segments = doc.get_segments()   # list of SegmentInfo (all tiers)

# Add new annotations from pipeline results
doc.add_tier("AutoSegmentation")
doc.add_segment("AutoSegmentation", 1.0, 2.5, "SIGN")
doc.save()  # preserves all original XML structure

Simple Read/Write

from sltk.io import read_eaf, write_eaf
from sltk.data import Segment, SegmentList

# Read
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Write
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="video.mp4")

Web Interface

SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.

# Development: FastAPI (port 8000) + Vite (port 5173)
bash scripts/run_dev.sh

# Production
cd frontend && npm ci && npm run build && cd ..
sltk serve --host 0.0.0.0 --port 8000

Route	Page	Purpose
`/`	Workspaces	Create/switch workspaces, scan directories
`/process`	Process	Submit segmentation and spotting jobs
`/explore`	Explore	Search glosses, view video clips, corpus statistics
`/viewer`	Viewer	Video playback with annotation overlay
`/analysis/*`	Analysis	Vocabulary, concordance, n-grams, collocations, durations

CLI Reference

sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload

Configuration

Variable	Description	Default
`SLTK_CORS_ORIGINS`	Allowed CORS origins	`http://localhost:5173,http://localhost:3000`
`SLTK_ALLOWED_PATHS`	Filesystem whitelist for API	`/vol/research,/home`
`SLTK_WEIGHTS_DIR`	Override weight cache location	`~/.cache/sltk/weights/`

Supported Pose Formats

Format	Extractor	Keypoints	Description
WiLoR	`WiLoRExtractor`	21 per hand	MANO 3D hand mesh with rotation matrices
NLF/SMPL-X	`NLFExtractor`	55 joints	Full body with axis-angle rotations
TEASER/FLAME	`TeaserExtractor`	FLAME params	Face parameters: jaw, expression, shape, eyelid
MediaPipe	`MediaPipeExtractor`	33+42+468	Fast 2D/3D holistic landmarks
RTMPose	`RTMPoseExtractor`	133	COCO-WholeBody, multi-person

All stored as HDF5 (.h5) files.

Testing

pytest                       # Full suite (1500+ tests)
pytest -m "not slow"         # Skip slow tests
pytest -m api                # API tests only

License

This project is licensed under CC-BY-NC-ND-4.0 (Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International).

Third-Party Licenses

SLTK bundles or depends on models that have their own license terms. By using SLTK you agree to comply with all applicable licenses:

Component	License	Link
MANO (hand model)	MANO License (non-commercial)	https://mano.is.tue.mpg.de/license.html
SMPL / SMPL-X (body model)	SMPL License (non-commercial)	https://smpl.is.tue.mpg.de/license.html
FLAME (face model)	FLAME License (non-commercial)	https://flame.is.tue.mpg.de/license.html
WiLoR	Apache 2.0	[Potamias et al., 2024]
TEASER	See paper	[Liu et al., 2025]
NLF	See repository	[Sarandi et al.]

Please refer to each project's license before using their models in your work.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Jun 1, 2026

0.1.12

Mar 19, 2026

0.1.11

Mar 19, 2026

0.1.9

Mar 10, 2026

0.1.8

Mar 10, 2026

This version

0.1.7

Mar 10, 2026

0.1.6

Mar 10, 2026

0.1.5

Mar 10, 2026

0.1.4

Mar 10, 2026

0.1.3

Mar 9, 2026

0.1.1

Mar 9, 2026

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.7.tar.gz (1.2 MB view details)

Uploaded Mar 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

signlangtk-0.1.7-py3-none-any.whl (1.2 MB view details)

Uploaded Mar 10, 2026 Python 3

File details

Details for the file signlangtk-0.1.7.tar.gz.

File metadata

Download URL: signlangtk-0.1.7.tar.gz
Upload date: Mar 10, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`c6058b7a9b28cf1da320a800da7c519d5d37d7471f77e081871ba87a43dd8aa6`
MD5	`4264b0cd4118a40a8c574bbc769b51cd`
BLAKE2b-256	`f6b38a52fe310dc1ebc1b53ef636aca672e736c834f088242bc2c6b4989e23e5`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.7.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signlangtk-0.1.7.tar.gz
- Subject digest: c6058b7a9b28cf1da320a800da7c519d5d37d7471f77e081871ba87a43dd8aa6
- Sigstore transparency entry: 1076477072
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: ed-fish/Sign-Language-Toolkit@b317ede3457d99568c7acd21ca4cc7e511c5d381
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/ed-fish
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b317ede3457d99568c7acd21ca4cc7e511c5d381
- Trigger Event: push

File details

Details for the file signlangtk-0.1.7-py3-none-any.whl.

File metadata

Download URL: signlangtk-0.1.7-py3-none-any.whl
Upload date: Mar 10, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b201ce2a3dc53deea0823b4e60a58e55bc6bfc6d5c45d1455d804c9970ab3ddc`
MD5	`7c5ee7d84c2fd57263331352f611a8c7`
BLAKE2b-256	`4f97116574be83e2b07170dfff624d7da889893ac95dbbc4ab4afb17bdf7036c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.7-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signlangtk-0.1.7-py3-none-any.whl
- Subject digest: b201ce2a3dc53deea0823b4e60a58e55bc6bfc6d5c45d1455d804c9970ab3ddc
- Sigstore transparency entry: 1076477079
- Sigstore integration time: Mar 10, 2026
Source repository:
- Permalink: ed-fish/Sign-Language-Toolkit@b317ede3457d99568c7acd21ca4cc7e511c5d381
- Branch / Tag: refs/tags/v0.1.7
- Owner: https://github.com/ed-fish
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b317ede3457d99568c7acd21ca4cc7e511c5d381
- Trigger Event: push

signlangtk 0.1.7

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sign Language Toolkit (SLTK)

Installation

Quick Start: Video to Multi-Tier ELAN

Extraction: Hands, Body, and Face

WiLoR — 3D Hand Reconstruction

NLF — Full-Body SMPL-X

TEASER — FLAME Face Parameters

Batch Processing

Model Weights

Segmentation: Finding Sign Boundaries

From WiLoR H5

Gloss Spotting: Matching Signs to a Dictionary

Building a Dictionary

Non-Manual Signals: Face and Head Analysis

Available Detectors

Feature Representations

WiLoR Segmenter Features (192-dim)

Angle Features (104-dim)

HaMeR Features (288-dim)

SignRep Embeddings (768-dim)

Combined Features from H5

ELAN File I/O

Writing a New ELAN File

Reading and Modifying Existing ELAN Files

Simple Read/Write

Web Interface

CLI Reference

Configuration

Supported Pose Formats

Testing

License

Third-Party Licenses

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance