Sign Language Toolkit for sign language research
Project description
Sign Language Toolkit (SLTK)
A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to rich, multi-tier ELAN annotation files: 3D hand reconstruction, full-body pose, face tracking, automatic sign segmentation, gloss spotting, and non-manual signal detection.
Video (.mp4)
│
├─ WiLoR ──────► 3D hand keypoints + MANO rotations (_wilor.h5)
├─ NLF ────────► Full-body SMPL-X pose (_nlf.h5)
├─ TEASER ─────► FLAME face parameters (_teaser.h5)
│
├─ Segmenter ──► Sign boundaries (BIO labels)
├─ SignRep ────► Gloss spotting (dictionary matching)
├─ NMS ────────► Blinks, nods, shakes, mouth, gaze
│
└─ All results ► Multi-tier ELAN file (.eaf)
Installation
# Core library (ELAN I/O, CLI, corpus tools)
pip install signlangtk
# With web interface
pip install "signlangtk[api]"
# With hand extraction (WiLoR) — requires CUDA
pip install "signlangtk[wilor]"
# Full body (NLF) + face (TEASER)
pip install "signlangtk[nlf,teaser]"
# Everything (excludes rtmpose/smplfx which need mmcv via mim)
pip install "signlangtk[all]"
The PyPI package is signlangtk, the Python import is sltk:
import sltk
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.nlf import NLFExtractor
from sltk.extraction.teaser import TeaserExtractor
Requires Python 3.10+. For development from source:
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"
Quick Start: Video to Multi-Tier ELAN
This end-to-end example takes a single video and produces an ELAN file with sign boundaries, spotted glosses, and non-manual signals on separate tiers.
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms, export_results
from sltk.io.elan_roundtrip import ElanDocument
VIDEO = "recording.mp4"
FPS = 25.0
# ── Step 1: Extract poses ───────────────────────────────────────────
# Hands (WiLoR → MANO 3D)
with WiLoRExtractor() as ext:
ext.load_model()
ext.extract_from_video(VIDEO, "recording_wilor.h5")
# Face (TEASER → FLAME)
with TeaserExtractor() as ext:
ext.load_model()
ext.extract_from_video(VIDEO, "recording_teaser.h5")
# ── Step 2: Segment signs ──────────────────────────────────────────
features = h5_to_features("recording_wilor.h5") # (T, 192)
runner = get_runner()
labels = runner.predict(features) # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels) # [(start, end), ...]
# ── Step 3: Detect non-manual signals ──────────────────────────────
blinks, nms_events, quality = detect_nms(
"recording_teaser.h5",
detectors={"all"},
)
# ── Step 4: Build multi-tier ELAN file ─────────────────────────────
doc = ElanDocument.new(video_path=VIDEO)
# Add segmentation tier
doc.add_tier("Segmentation")
for start_frame, end_frame in segments:
doc.add_segment("Segmentation", start_frame / FPS, end_frame / FPS, "SIGN")
# Add NMS tiers (blinks, head movements, mouth, eyebrows, gaze)
for tier_name in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT",
"MOUTH-MOVEMENT", "EYEBROW-RAISE", "EYE-GAZE"]:
doc.add_tier(tier_name)
for b in blinks:
doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")
for ev in nms_events:
doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)
doc.save("recording_full.eaf")
Open recording_full.eaf in ELAN to see all tiers aligned with the video.
Extraction: Hands, Body, and Face
All extractors share the same interface: load_model(), extract_from_video(), process_batch(). Weights are auto-downloaded from HuggingFace Hub on first use.
WiLoR — 3D Hand Reconstruction
Produces 21 keypoints per hand + MANO rotation matrices per frame.
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig
config = WiLoRConfig(
device="cuda:0",
img_batch_size=128,
rescale_factor=2.0,
)
with WiLoRExtractor(config) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_wilor.h5")
Output H5 structure:
video_wilor.h5
├── attrs: fps, num_frames, resolution
├── frame_idx (num_frames, 2) # sparse: (start_idx, count)
├── kpts_3d (num_detections, 21, 3) # 3D hand keypoints
├── right (num_detections,) # True = right hand
└── mano/
├── hand_pose (num_detections, 15, 3, 3) # joint rotations
└── global_orient (num_detections, 1, 3, 3) # wrist rotation
NLF — Full-Body SMPL-X
Produces 55 SMPL-X joints (body + hands + face landmarks) per frame.
from sltk.extraction.nlf import NLFExtractor, NLFConfig
with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_nlf.h5")
TEASER — FLAME Face Parameters
Produces FLAME 3D face parameters: jaw pose, expression coefficients, shape, eyelid state, and head pose per frame. This is what drives NMS detection.
from sltk.extraction.teaser import TeaserExtractor, TeaserConfig
with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
ext.load_model()
result = ext.extract_from_video("video.mp4", "video_teaser.h5")
Batch Processing
All extractors support batch processing over a directory:
from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor
with WiLoRExtractor() as ext:
ext.load_model()
results = ext.process_batch(
video_paths=list(Path("videos/").glob("*.mp4")),
output_dir=Path("poses/"),
skip_existing=True,
)
for path, result in results.items():
print(f"{path}: {result.num_frames} frames, {result.num_detections} detections")
Model Weights
All weights are auto-downloaded from HuggingFace Hub and cached at ~/.cache/sltk/weights/. Override with environment variables:
| Variable | Model |
|---|---|
SLTK_WILOR_CHECKPOINT |
WiLoR hand model |
SLTK_NLF_MODEL |
NLF body model |
SLTK_TEASER_CHECKPOINT |
TEASER face model |
SLTK_SIGNREP_CHECKPOINT |
SignRep embedding model |
SLTK_SEGMENTOR_V2_CHECKPOINT |
Segmenter model |
Segmentation: Finding Sign Boundaries
The segmenter is a 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN).
From WiLoR H5
from sltk.segmentation.runner import get_runner, segment_h5
from sltk.segmentation.output import OutputFormat
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
# High-level: segment a file → ELAN or JSON
segment_h5(
"video_wilor.h5",
output_path="video_segments.eaf",
output_format=OutputFormat.ELAN,
fps=25.0,
media_path="video.mp4",
)
# Low-level: get raw predictions
features = h5_to_features("video_wilor.h5") # (T, 192)
runner = get_runner()
labels = runner.predict(features) # (T,) values 0/1/2
segments = extract_segments(labels) # [(start_frame, end_frame), ...]
Gloss Spotting: Matching Signs to a Dictionary
SignRep extracts 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
# Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# Load your sign dictionary (folder of .npz files, one per sign)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])
# Match segments to dictionary entries
result = pipeline.spot(
features=continuous,
segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
dictionary=dictionary,
top_k=10,
)
for seg in result.segments:
print(f"Segment {seg.start_ms}ms-{seg.end_ms}ms:")
for gl in seg.top_glosses:
print(f" {gl['gloss']} ({gl['similarity']:.3f})")
# Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")
Building a Dictionary
Before spotting, build a dictionary from isolated sign videos:
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")
Non-Manual Signals: Face and Head Analysis
Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face-tracking data.
from sltk.nms.runner import detect_nms, export_results
# Detect all NMS events from a TEASER H5 file
blinks, nms_events, quality = detect_nms(
"video_teaser.h5",
detectors={"all"}, # or specific: {"blink", "nod", "mouth"}
smpl_path="video_nlf.h5", # optional: enables gaze detection
)
# Export to ELAN (one tier per signal type)
export_results(
blinks, nms_events, "video_teaser.h5",
output_dir="output/",
formats=["elan", "json", "csv"],
media_path="video.mp4",
)
Available Detectors
| Detector | ELAN Tier | Signal | Input |
|---|---|---|---|
blink |
BLINK |
Eye closures | TEASER eyelid params |
nod |
HEAD-NOD |
Vertical head oscillation | TEASER head pitch |
shake |
HEAD-SHAKE |
Horizontal head oscillation | TEASER head yaw |
tilt |
HEAD-TILT |
Side-to-side head tilt | TEASER head roll |
mouth |
MOUTH-MOVEMENT |
Lip/mouth movement | FLAME expression |
eyebrow |
EYEBROW-RAISE |
Eyebrow raise/furrow | FLAME expression |
gaze |
EYE-GAZE |
Gaze direction | NLF eye pose |
squint |
EYE-SQUINT |
Partial eye closure | TEASER eyelid |
Feature Representations
SLTK computes several feature representations from the extracted H5 pose data.
WiLoR Segmenter Features (192-dim)
MANO rotation matrices converted to axis-angle, both hands concatenated. Used by the Transformer segmenter.
from sltk.segmentation.h5_loader import h5_to_features
features = h5_to_features("video_wilor.h5") # (T, 192)
Angle Features (104-dim)
Body joint angles + hand Euler angles from MANO rotations.
from sltk.processing.features import compute_angle_features
angles = compute_angle_features(body_poses, right_hand, left_hand) # (T, 104)
HaMeR Features (288-dim)
Flattened MANO rotation matrices for both hands.
from sltk.processing.features import compute_hamer_features
hamer = compute_hamer_features(
mano_global_orient_right, mano_hand_pose_right,
mano_global_orient_left, mano_hand_pose_left,
) # (T, 288)
SignRep Embeddings (768-dim)
Dense visual features from the SignRep ViT model. 16-frame windows, L2-normalized.
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768)
Combined Features from H5
from sltk.processing.features import load_features_from_nlf_wilor
angles, hamer = load_features_from_nlf_wilor("video_nlf.h5", "video_wilor.h5")
ELAN File I/O
Writing a New ELAN File
from sltk.io.elan_roundtrip import ElanDocument
doc = ElanDocument.new(video_path="video.mp4")
doc.add_tier("Gloss")
doc.add_tier("NMS")
doc.add_segment("Gloss", 0.0, 1.5, "HELLO")
doc.add_segment("Gloss", 1.5, 3.0, "WORLD")
doc.add_segment("NMS", 0.2, 0.8, "nod")
doc.save("output.eaf")
Reading and Modifying Existing ELAN Files
doc = ElanDocument.open("annotations.eaf")
tiers = doc.get_tiers() # list of TierInfo
segments = doc.get_segments() # list of SegmentInfo (all tiers)
# Add new annotations from pipeline results
doc.add_tier("AutoSegmentation")
doc.add_segment("AutoSegmentation", 1.0, 2.5, "SIGN")
doc.save() # preserves all original XML structure
Simple Read/Write
from sltk.io import read_eaf, write_eaf
from sltk.data import Segment, SegmentList
# Read
segments = read_eaf("annotations.eaf", tiers=["Gloss"])
# Write
new_segments = SegmentList([
Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="video.mp4")
Web Interface
SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.
# Development: FastAPI (port 8000) + Vite (port 5173)
bash scripts/run_dev.sh
# Production
cd frontend && npm ci && npm run build && cd ..
sltk serve --host 0.0.0.0 --port 8000
| Route | Page | Purpose |
|---|---|---|
/ |
Workspaces | Create/switch workspaces, scan directories |
/process |
Process | Submit segmentation and spotting jobs |
/explore |
Explore | Search glosses, view video clips, corpus statistics |
/viewer |
Viewer | Video playback with annotation overlay |
/analysis/* |
Analysis | Vocabulary, concordance, n-grams, collocations, durations |
CLI Reference
sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload
Configuration
| Variable | Description | Default |
|---|---|---|
SLTK_CORS_ORIGINS |
Allowed CORS origins | http://localhost:5173,http://localhost:3000 |
SLTK_ALLOWED_PATHS |
Filesystem whitelist for API | /vol/research,/home |
SLTK_WEIGHTS_DIR |
Override weight cache location | ~/.cache/sltk/weights/ |
Supported Pose Formats
| Format | Extractor | Keypoints | Description |
|---|---|---|---|
| WiLoR | WiLoRExtractor |
21 per hand | MANO 3D hand mesh with rotation matrices |
| NLF/SMPL-X | NLFExtractor |
55 joints | Full body with axis-angle rotations |
| TEASER/FLAME | TeaserExtractor |
FLAME params | Face parameters: jaw, expression, shape, eyelid |
| MediaPipe | MediaPipeExtractor |
33+42+468 | Fast 2D/3D holistic landmarks |
| RTMPose | RTMPoseExtractor |
133 | COCO-WholeBody, multi-person |
All stored as HDF5 (.h5) files.
Testing
pytest # Full suite (1500+ tests)
pytest -m "not slow" # Skip slow tests
pytest -m api # API tests only
License
CC-BY-NC-4.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signlangtk-0.1.5.tar.gz.
File metadata
- Download URL: signlangtk-0.1.5.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
704abced878523f8bfcf217f491db9ad83f74a6db1c4acd9792b47e681b5c618
|
|
| MD5 |
0851aa92a6cd0355e60c71f80b5a8b28
|
|
| BLAKE2b-256 |
760762cddb27555548eba33c01d1a5b4a71360dc0c4184fa9494a75d5d6eccd0
|
Provenance
The following attestation bundles were made for signlangtk-0.1.5.tar.gz:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.5.tar.gz -
Subject digest:
704abced878523f8bfcf217f491db9ad83f74a6db1c4acd9792b47e681b5c618 - Sigstore transparency entry: 1075660396
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@226774816e51ce22478a61589b42dc5c85a52613 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@226774816e51ce22478a61589b42dc5c85a52613 -
Trigger Event:
push
-
Statement type:
File details
Details for the file signlangtk-0.1.5-py3-none-any.whl.
File metadata
- Download URL: signlangtk-0.1.5-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61c27ae1fef33257105910e2dfa27308b72999d82500ac0be2f6c701afa659df
|
|
| MD5 |
cd10c1b5d2e3520f9c1f480f86085b82
|
|
| BLAKE2b-256 |
36438b15f85e83dead729de0560f027d9af73dd74331661a197ecf498f8e279c
|
Provenance
The following attestation bundles were made for signlangtk-0.1.5-py3-none-any.whl:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.5-py3-none-any.whl -
Subject digest:
61c27ae1fef33257105910e2dfa27308b72999d82500ac0be2f6c701afa659df - Sigstore transparency entry: 1075660437
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@226774816e51ce22478a61589b42dc5c85a52613 -
Branch / Tag:
refs/tags/v0.1.5 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@226774816e51ce22478a61589b42dc5c85a52613 -
Trigger Event:
push
-
Statement type: