Skip to main content

Sign Language Toolkit for sign language research

Project description

Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to rich, multi-tier ELAN annotation files: 3D hand reconstruction, full-body pose, face tracking, automatic sign segmentation, gloss spotting, and non-manual signal detection.

Video (.mp4)
    │
    ├─ WiLoR ──────► 3D hand keypoints + MANO rotations   (_wilor.h5)
    ├─ NLF ────────► Full-body SMPL-X pose                 (_nlf.h5)
    ├─ TEASER ─────► FLAME face parameters                 (_teaser.h5)
    │
    ├─ Segmenter ──► Sign boundaries (BIO labels)
    ├─ SignRep ────► Gloss spotting (dictionary matching)
    ├─ NMS ────────► Blinks, nods, shakes, mouth, gaze
    │
    └─ All results ► Multi-tier ELAN file (.eaf)

Installation

# Core library (ELAN I/O, CLI, corpus tools)
pip install signlangtk

# With web interface
pip install "signlangtk[api]"

# With hand extraction (WiLoR) — requires CUDA
pip install "signlangtk[wilor]"

# Full body (NLF) + face (TEASER)
pip install "signlangtk[nlf,teaser]"

# Everything (excludes rtmpose/smplfx which need mmcv via mim)
pip install "signlangtk[all]"

The PyPI package is signlangtk, the Python import is sltk:

import sltk
from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.nlf import NLFExtractor
from sltk.extraction.teaser import TeaserExtractor

Requires Python 3.10+. For development from source:

git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"

Quick Start: Video to Multi-Tier ELAN

This end-to-end example takes a single video and produces an ELAN file with sign boundaries, spotted glosses, and non-manual signals on separate tiers.

from sltk.extraction.wilor import WiLoRExtractor
from sltk.extraction.teaser import TeaserExtractor
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
from sltk.nms.runner import detect_nms, export_results
from sltk.io.elan_roundtrip import ElanDocument

VIDEO = "recording.mp4"
FPS = 25.0

# ── Step 1: Extract poses ───────────────────────────────────────────
# Hands (WiLoR → MANO 3D)
with WiLoRExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_wilor.h5")

# Face (TEASER → FLAME)
with TeaserExtractor() as ext:
    ext.load_model()
    ext.extract_from_video(VIDEO, "recording_teaser.h5")

# ── Step 2: Segment signs ──────────────────────────────────────────
features = h5_to_features("recording_wilor.h5")   # (T, 192)
runner = get_runner()
labels = runner.predict(features)                  # 0=OUT, 1=IN, 2=BEGIN
segments = extract_segments(labels)                # [(start, end), ...]

# ── Step 3: Detect non-manual signals ──────────────────────────────
blinks, nms_events, quality = detect_nms(
    "recording_teaser.h5",
    detectors={"all"},
)

# ── Step 4: Build multi-tier ELAN file ─────────────────────────────
doc = ElanDocument.new(video_path=VIDEO)

# Add segmentation tier
doc.add_tier("Segmentation")
for start_frame, end_frame in segments:
    doc.add_segment("Segmentation", start_frame / FPS, end_frame / FPS, "SIGN")

# Add NMS tiers (blinks, head movements, mouth, eyebrows, gaze)
for tier_name in ["BLINK", "HEAD-NOD", "HEAD-SHAKE", "HEAD-TILT",
                  "MOUTH-MOVEMENT", "EYEBROW-RAISE", "EYE-GAZE"]:
    doc.add_tier(tier_name)

for b in blinks:
    doc.add_segment("BLINK", b.start_frame / FPS, b.end_frame / FPS, "blink")

for ev in nms_events:
    doc.add_segment(ev.tier, ev.start_frame / FPS, ev.end_frame / FPS, ev.label)

doc.save("recording_full.eaf")

Open recording_full.eaf in ELAN to see all tiers aligned with the video.


Extraction: Hands, Body, and Face

All extractors share the same interface: load_model(), extract_from_video(), process_batch(). Weights are auto-downloaded from HuggingFace Hub on first use.

WiLoR — 3D Hand Reconstruction

Produces 21 keypoints per hand + MANO rotation matrices per frame.

from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig

config = WiLoRConfig(
    device="cuda:0",
    img_batch_size=128,
    rescale_factor=2.0,
)
with WiLoRExtractor(config) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_wilor.h5")

Output H5 structure:

video_wilor.h5
├── attrs: fps, num_frames, resolution
├── frame_idx      (num_frames, 2)           # sparse: (start_idx, count)
├── kpts_3d        (num_detections, 21, 3)   # 3D hand keypoints
├── right          (num_detections,)          # True = right hand
└── mano/
    ├── hand_pose      (num_detections, 15, 3, 3)  # joint rotations
    └── global_orient  (num_detections, 1, 3, 3)    # wrist rotation

NLF — Full-Body SMPL-X

Produces 55 SMPL-X joints (body + hands + face landmarks) per frame.

from sltk.extraction.nlf import NLFExtractor, NLFConfig

with NLFExtractor(NLFConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_nlf.h5")

TEASER — FLAME Face Parameters

Produces FLAME 3D face parameters: jaw pose, expression coefficients, shape, eyelid state, and head pose per frame. This is what drives NMS detection.

from sltk.extraction.teaser import TeaserExtractor, TeaserConfig

with TeaserExtractor(TeaserConfig(device="cuda:0")) as ext:
    ext.load_model()
    result = ext.extract_from_video("video.mp4", "video_teaser.h5")

Batch Processing

All extractors support batch processing over a directory:

from pathlib import Path
from sltk.extraction.wilor import WiLoRExtractor

with WiLoRExtractor() as ext:
    ext.load_model()
    results = ext.process_batch(
        video_paths=list(Path("videos/").glob("*.mp4")),
        output_dir=Path("poses/"),
        skip_existing=True,
    )
    for path, result in results.items():
        print(f"{path}: {result.num_frames} frames, {result.num_detections} detections")

Model Weights

All weights are auto-downloaded from HuggingFace Hub and cached at ~/.cache/sltk/weights/. Override with environment variables:

Variable Model
SLTK_WILOR_MODEL_PATH WiLoR hand model
SLTK_NLF_MODEL_PATH NLF body model
SLTK_TEASER_CHECKPOINT TEASER face model
SLTK_SIGNREP_CHECKPOINT SignRep embedding model
SLTK_SEGMENTOR_PATH Segmenter model

Segmentation: Finding Sign Boundaries

The segmenter is a 4-layer Transformer that reads WiLoR hand features and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN).

From WiLoR H5

from sltk.segmentation.runner import get_runner, segment_h5
from sltk.segmentation.output import OutputFormat
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# High-level: segment a file → ELAN or JSON
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",
)

# Low-level: get raw predictions
features = h5_to_features("video_wilor.h5")  # (T, 192)
runner = get_runner()
labels = runner.predict(features)             # (T,) values 0/1/2
segments = extract_segments(labels)           # [(start_frame, end_frame), ...]

Gloss Spotting: Matching Signs to a Dictionary

SignRep extracts 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)

# Load your sign dictionary (folder of .npz files, one per sign)
dictionary = pipeline.load_dictionary(["/data/dictionaries/bsldict/signrep/"])

# Match segments to dictionary entries
result = pipeline.spot(
    features=continuous,
    segments=[{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    dictionary=dictionary,
    top_k=10,
)

for seg in result.segments:
    print(f"Segment {seg.start_ms}ms-{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  {gl['gloss']} ({gl['similarity']:.3f})")

# Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")

Building a Dictionary

Before spotting, build a dictionary from isolated sign videos:

result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")

Non-Manual Signals: Face and Head Analysis

Detect blinks, head nods/shakes/tilts, mouth movements, eyebrow raises, gaze direction, and eye squints from TEASER face-tracking data.

from sltk.nms.runner import detect_nms, export_results

# Detect all NMS events from a TEASER H5 file
blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},           # or specific: {"blink", "nod", "mouth"}
    smpl_path="video_nlf.h5",   # optional: enables gaze detection
)

# Export to ELAN (one tier per signal type)
export_results(
    blinks, nms_events, "video_teaser.h5",
    output_dir="output/",
    formats=["elan", "json", "csv"],
    media_path="video.mp4",
)

Available Detectors

Detector ELAN Tier Signal Input
blink BLINK Eye closures TEASER eyelid params
nod HEAD-NOD Vertical head oscillation TEASER head pitch
shake HEAD-SHAKE Horizontal head oscillation TEASER head yaw
tilt HEAD-TILT Side-to-side head tilt TEASER head roll
mouth MOUTH-MOVEMENT Lip/mouth movement FLAME expression
eyebrow EYEBROW-RAISE Eyebrow raise/furrow FLAME expression
gaze EYE-GAZE Gaze direction NLF eye pose
squint EYE-SQUINT Partial eye closure TEASER eyelid

Feature Representations

SLTK computes several feature representations from the extracted H5 pose data.

WiLoR Segmenter Features (192-dim)

MANO rotation matrices converted to axis-angle, both hands concatenated. Used by the Transformer segmenter.

from sltk.segmentation.h5_loader import h5_to_features
features = h5_to_features("video_wilor.h5")  # (T, 192)

Angle Features (104-dim)

Body joint angles + hand Euler angles from MANO rotations.

from sltk.processing.features import compute_angle_features
angles = compute_angle_features(body_poses, right_hand, left_hand)  # (T, 104)

HaMeR Features (288-dim)

Flattened MANO rotation matrices for both hands.

from sltk.processing.features import compute_hamer_features
hamer = compute_hamer_features(
    mano_global_orient_right, mano_hand_pose_right,
    mano_global_orient_left, mano_hand_pose_left,
)  # (T, 288)

SignRep Embeddings (768-dim)

Dense visual features from the SignRep ViT model. 16-frame windows, L2-normalized.

from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768)

Combined Features from H5

from sltk.processing.features import load_features_from_nlf_wilor
angles, hamer = load_features_from_nlf_wilor("video_nlf.h5", "video_wilor.h5")

ELAN File I/O

Writing a New ELAN File

from sltk.io.elan_roundtrip import ElanDocument

doc = ElanDocument.new(video_path="video.mp4")
doc.add_tier("Gloss")
doc.add_tier("NMS")
doc.add_segment("Gloss", 0.0, 1.5, "HELLO")
doc.add_segment("Gloss", 1.5, 3.0, "WORLD")
doc.add_segment("NMS", 0.2, 0.8, "nod")
doc.save("output.eaf")

Reading and Modifying Existing ELAN Files

doc = ElanDocument.open("annotations.eaf")
tiers = doc.get_tiers()         # list of TierInfo
segments = doc.get_segments()   # list of SegmentInfo (all tiers)

# Add new annotations from pipeline results
doc.add_tier("AutoSegmentation")
doc.add_segment("AutoSegmentation", 1.0, 2.5, "SIGN")
doc.save()  # preserves all original XML structure

Simple Read/Write

from sltk.io import read_eaf, write_eaf
from sltk.data import Segment, SegmentList

# Read
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Write
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="video.mp4")

Web Interface

SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.

# Development: FastAPI (port 8000) + Vite (port 5173)
bash scripts/run_dev.sh

# Production
cd frontend && npm ci && npm run build && cd ..
sltk serve --host 0.0.0.0 --port 8000
Route Page Purpose
/ Workspaces Create/switch workspaces, scan directories
/process Process Submit segmentation and spotting jobs
/explore Explore Search glosses, view video clips, corpus statistics
/viewer Viewer Video playback with annotation overlay
/analysis/* Analysis Vocabulary, concordance, n-grams, collocations, durations

CLI Reference

sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload

Configuration

Variable Description Default
SLTK_CORS_ORIGINS Allowed CORS origins http://localhost:5173,http://localhost:3000
SLTK_ALLOWED_PATHS Filesystem whitelist for API /vol/research,/home
SLTK_WEIGHTS_DIR Override weight cache location ~/.cache/sltk/weights/

Supported Pose Formats

Format Extractor Keypoints Description
WiLoR WiLoRExtractor 21 per hand MANO 3D hand mesh with rotation matrices
NLF/SMPL-X NLFExtractor 55 joints Full body with axis-angle rotations
TEASER/FLAME TeaserExtractor FLAME params Face parameters: jaw, expression, shape, eyelid
MediaPipe MediaPipeExtractor 33+42+468 Fast 2D/3D holistic landmarks
RTMPose RTMPoseExtractor 133 COCO-WholeBody, multi-person

All stored as HDF5 (.h5) files.

Testing

pytest                       # Full suite (1500+ tests)
pytest -m "not slow"         # Skip slow tests
pytest -m api                # API tests only

License

CC-BY-NC-4.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.4.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

signlangtk-0.1.4-py3-none-any.whl (1.2 MB view details)

Uploaded Python 3

File details

Details for the file signlangtk-0.1.4.tar.gz.

File metadata

  • Download URL: signlangtk-0.1.4.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.4.tar.gz
Algorithm Hash digest
SHA256 bfef922f17dda605043d6ccc99fe56c84fd03bdcfe241dd6c23a1453792dc85a
MD5 f0aa1d98feed1f593aea10d623b942e7
BLAKE2b-256 7cf3fcb058b2528f459be7ccd9146e43a5beecdd55d9726923258eca0c97ff2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.4.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file signlangtk-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: signlangtk-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 1.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6684d5282e5f899d263a0dbf207863e240d772ae4e038dd259b93ea8c38c1365
MD5 2dea68c8cea1242174f6fd449a5646b9
BLAKE2b-256 c0ede903beb21b2871372b802f573ce7d3578fb481d670ece6c9a36450d92da2

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.4-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page