Sign Language Toolkit for sign language research
Project description
Sign Language Toolkit (SLTK)
A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to linguistic annotations: pose extraction, automatic segmentation, gloss spotting, and corpus analysis — all accessible via Python, CLI, or a web interface.
Installation
# Core library (data loading, ELAN I/O, CLI)
pip install signlangtk
# With web interface
pip install "signlangtk[api]"
# With GPU-accelerated pose extraction (WiLoR hand model)
pip install "signlangtk[wilor]"
# Everything
pip install "signlangtk[all]"
The PyPI package is called signlangtk, but the Python import is sltk:
import sltk
from sltk.data import PoseSequence, Segment
Requires Python 3.10+. For development from source:
git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"
The Processing Pipeline
SLTK's core workflow is a three-stage pipeline that turns raw sign language video into searchable, annotated ELAN files:
Video (.mp4)
│
├── Stage 1: Pose Extraction ──► {stem}_wilor.h5
│ WiLoR hand model: MANO rotation matrices + 3D keypoints
│
├── Stage 2: Segmentation ──► {stem}_segments.eaf
│ Transformer model predicts sign boundaries (BIO labels)
│
└── Stage 3: Spotting ──► {stem}_spotted.eaf
SignRep model matches segments to a dictionary of known signs
Each stage can be run independently. If you already have H5 pose files, start at Stage 2. If you already have segment boundaries, start at Stage 3.
Stage 1: Pose Extraction
Extract hand poses from video using the WiLoR hand model. This produces an HDF5 file containing MANO rotation matrices and 21 3D keypoints per detected hand, per frame.
Python
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig
config = WiLoRConfig(
checkpoint_path="path/to/wilor_final.ckpt",
detector_path="path/to/detector.pt",
rescale_factor=2.0,
detection_confidence=0.3,
)
extractor = WiLoRExtractor(config)
extractor.load_model()
result = extractor.extract_from_video("video.mp4")
# Saves to video_wilor.h5
extractor.close()
API
# Start extraction job (runs in background on GPU)
curl -X POST http://localhost:8000/api/extraction/start \
-H "Content-Type: application/json" \
-d '{
"video_path": "/data/video.mp4",
"output_root": "/data/output",
"config": {"enable_wilor": true, "device": "cuda"}
}'
# Returns: {"job_id": "abc123", ...}
# Poll progress
curl http://localhost:8000/api/extraction/status/abc123
Output format
The WiLoR H5 file has this structure:
video_wilor.h5
├── attrs: fps, num_frames, resolution, extractor
├── frame_idx (num_frames, 2) # (start_idx, count) per frame
├── kpts_3d (num_detections, 21, 3) # 3D hand keypoints
├── right (num_detections,) # True = right hand
└── mano/
├── hand_pose (num_detections, 15, 3, 3) # joint rotations
└── global_orient (num_detections, 1, 3, 3) # wrist rotation
Model weights
Checkpoints are resolved in this order:
- Explicit path in
WiLoRConfig - Environment variable:
SLTK_WILOR_CHECKPOINT,SLTK_WILOR_DETECTOR - Bundled at
sltk/weights/wilor/
Other extractors (MediaPipe, NLF/SMPL-X, TEASER, RTMPose) are also available — see sltk/extraction/.
Stage 2: Segmentation
The segmenter is a 4-layer Transformer that reads WiLoR H5 files and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN), identifying where individual signs start and end.
Python — high level
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat
# Segment a single file → ELAN output
segment_h5(
"video_wilor.h5",
output_path="video_segments.eaf",
output_format=OutputFormat.ELAN,
fps=25.0,
media_path="video.mp4", # links the video in the EAF
)
# Segment a single file → JSON output
segment_h5(
"video_wilor.h5",
output_path="video_segments.json",
output_format=OutputFormat.JSON,
fps=25.0,
)
# Segment an entire directory
segment_h5(
"/data/poses/",
output_path="/data/segments/output.json",
output_format=OutputFormat.JSON,
fps=25.0,
)
Python — low level
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
# Load H5 → 192-dim feature vectors (MANO rotations as axis-angle)
features = h5_to_features("video_wilor.h5") # shape: (num_frames, 192)
# Run the Transformer model
runner = get_runner() # singleton, loads checkpoint once
labels = runner.predict(features) # shape: (num_frames,) values 0/1/2
# Extract segment boundaries
segments = extract_segments(labels)
# [(12, 45), (50, 82), (90, 120), ...]
API
# Segment a single file
curl -X POST http://localhost:8000/api/segmentation/segment \
-H "Content-Type: application/json" \
-d '{"h5_path": "/data/video_wilor.h5", "fps": 25.0}'
# Batch segment a directory
curl -X POST http://localhost:8000/api/segmentation/segment/batch \
-H "Content-Type: application/json" \
-d '{
"directory": "/data/poses/",
"fps": 25.0,
"output_path": "/data/segments/",
"output_format": "json"
}'
JSON output
{
"video_name": {
"fps": 25.0,
"num_frames": 3000,
"segments": [
{"start_frame": 12, "end_frame": 45, "start_sec": 0.48, "end_sec": 1.80},
{"start_frame": 50, "end_frame": 82, "start_sec": 2.00, "end_sec": 3.28}
]
}
}
ELAN output
Creates a tier named {video_name}_segmentation with each segment labelled SIGN, authored by segmenter_v2.
Model checkpoint
Set SLTK_SEGMENTOR_CHECKPOINT or place segmentor_v2.ckpt in sltk/weights/segmentor/.
Stage 3: Gloss Spotting
The spotter uses SignRep (a ViT-based model) to extract 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.
Prerequisites
- Segment boundaries from Stage 2 (or your own)
- A dictionary: a folder of
.npzfiles (one per sign), each containing abest_latentkey with a 768-dim feature vector
Python — full pipeline
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
# 1. Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features shape: (num_windows, 768), L2-normalized
# 2. Load dictionary
dictionary = pipeline.load_dictionary(
["/data/dictionaries/bsldict/signrep/"],
feature_key="best_latent",
)
# 3. Define segments (from Stage 2, or load from JSON/EAF)
segments = [
{"segment_id": 0, "start_frame": 12, "end_frame": 45},
{"segment_id": 1, "start_frame": 50, "end_frame": 82},
]
# 4. Match each segment against the dictionary
result = pipeline.spot(
features=continuous,
segments=segments,
dictionary=dictionary,
top_k=10,
segment_pooling="max", # or "mean", "softmax_weighted"
)
# 5. Inspect results
for seg in result.segments:
print(f"Segment {seg.start_ms}ms–{seg.end_ms}ms:")
for gl in seg.top_glosses:
print(f" Rank {gl['rank']}: {gl['gloss']} ({gl['similarity']:.3f})")
# 6. Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")
Python — one-shot
result = pipeline.spot_from_video(
video_path="video.mp4",
segments_json="video_segments.json",
dictionary_dirs=["/data/dictionaries/bsldict/signrep/"],
top_k=20,
stride=4,
)
API
# Extract continuous features (cached server-side)
curl -X POST http://localhost:8000/api/signrep/continuous/extract \
-H "Content-Type: application/json" \
-d '{"video_path": "/data/video.mp4", "stride": 4}'
# Returns: {"features_id": "abc123", "num_windows": 500, ...}
# Spot glosses
curl -X POST http://localhost:8000/api/signrep/spot \
-H "Content-Type: application/json" \
-d '{
"features_id": "abc123",
"segments": [{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
"dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
"top_k": 10
}'
Building a dictionary
Before spotting, you need a dictionary. Extract one feature per isolated sign video:
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")
Batch extraction via the API:
curl -X POST http://localhost:8000/api/signrep/dictionary/batch/job \
-H "Content-Type: application/json" \
-d '{
"video_dir": "/data/isolated_signs/",
"output_dir": "/data/dictionary/",
"method": "middle"
}'
Model checkpoint
Set SLTK_SIGNREP_CHECKPOINT or place ckpt.pt in sltk/weights/signrep/.
End-to-End Processing
The processing API combines Stages 2 and 3 into a single background job. It expects WiLoR H5 files to already exist alongside the videos ({stem}_wilor.h5).
API
# Segmentation only
curl -X POST http://localhost:8000/api/processing/submit \
-H "Content-Type: application/json" \
-d '{
"video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
"type": "segments",
"fps": 25.0
}'
# Segmentation + spotting
curl -X POST http://localhost:8000/api/processing/submit \
-H "Content-Type: application/json" \
-d '{
"video_paths": ["/data/video1.mp4"],
"type": "spots",
"dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
"top_k": 5,
"fps": 25.0,
"workspace": "my_workspace"
}'
# Poll job
curl http://localhost:8000/api/processing/status/{job_id}
# Download result
curl -O http://localhost:8000/api/processing/output/{job_id}/video1_spotted.eaf
Output files
| Job type | Output | Description |
|---|---|---|
segments |
{stem}_segments.eaf |
Sign boundaries (SIGN labels) |
spots |
{stem}_segments.eaf + {stem}_spotted.eaf |
Boundaries + ranked gloss labels |
When a workspace is specified, output EAF files are automatically ingested into the corpus database for search and analysis.
Feature Extraction
SLTK computes several feature representations from H5 pose data, used by the segmenter and available for your own research.
WiLoR segmenter features (192-dim)
Used by the Transformer segmenter (Stage 2). Converts MANO rotation matrices to axis-angle, concatenates left and right hand:
from sltk.segmentation.h5_loader import h5_to_features
features = h5_to_features("video_wilor.h5")
# shape: (num_frames, 192)
# = 2 hands × 96 dims (16 joints × 6 axis-angle params)
Angle features (104-dim)
Body joint angles and hand Euler angles from MANO rotation matrices:
from sltk.processing.features import compute_angle_features
angles = compute_angle_features(body_poses, left_hand_poses, right_hand_poses)
# shape: (num_frames, 104)
# = 22 body angles + 41 left hand + 41 right hand
HaMeR features (288-dim)
Flattened MANO rotation matrices:
from sltk.processing.features import load_features_from_h5
angles, hamer = load_features_from_h5("video_mediapipe.h5", "video_wilor.h5")
# angles: (T, 104)
# hamer: (T, 288) = 2 × (135 hand_pose + 9 global_orient)
SignRep embeddings (768-dim)
Dense visual features from the SignRep ViT model:
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768), L2-normalized
# 16-frame windows at stride 4
Running the Web Interface
SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.
Development mode
# Start FastAPI backend (port 8000) + Vite dev server (port 5173)
bash scripts/run_dev.sh
This launches both servers. The frontend is available at http://localhost:5173 and proxies API requests to the backend. Interactive API docs are at http://localhost:8000/docs.
Production mode
# Build the frontend
cd frontend && npm ci && npm run build && cd ..
# Serve everything from FastAPI
sltk serve --host 0.0.0.0 --port 8000
The built frontend is served as static files from FastAPI at http://localhost:8000.
Backend only
uvicorn sltk.api.main:app --host 0.0.0.0 --port 8000
Frontend pages
| Route | Page | Purpose |
|---|---|---|
/ |
Workspaces | Create/switch workspaces, scan directories |
/process |
Process | Submit segmentation and spotting jobs |
/explore |
Explore | Search glosses, view video clips, corpus statistics |
/viewer |
Viewer | Video playback with annotation overlay |
/analysis/* |
Analysis | Vocabulary, concordance, n-grams, collocations, durations |
NMS Detection (Non-Manual Signals)
Detect blinks, head nods, shakes, tilts, and other non-manual signals from TEASER/FLAME face-tracking H5 files.
Python
from sltk.nms.runner import detect_nms, export_results
# Detect all NMS events
blinks, nms_events, quality = detect_nms(
"video_teaser.h5",
detectors={"all"}, # or {"blink", "nod", "shake", "tilt", "mouth", "eyebrow"}
)
# Export to ELAN
export_results(blinks, nms_events, "video_teaser.h5",
output_dir="output/",
formats=["elan", "json", "csv"],
participant_id="P01",
)
Available detectors
| Detector | Tier | Signal |
|---|---|---|
blink |
BLINK |
Eye closures from eyelid parameters |
nod |
HEAD-NOD |
Vertical head oscillation (pitch) |
shake |
HEAD-SHAKE |
Horizontal head oscillation (yaw) |
tilt |
HEAD-TILT |
Side-to-side head tilt (roll) |
mouth |
MOUTH |
Lip/mouth movement from FLAME expression |
eyebrow |
EYEBROW |
Eyebrow raise/furrow from FLAME expression |
gaze |
EYE-GAZE |
Gaze direction (requires NLF/SMPL file) |
squint |
EYE-SQUINT |
Partial eye closure |
API
curl -X POST http://localhost:8000/api/nms/detect \
-H "Content-Type: application/json" \
-d '{
"h5_path": "/data/video_teaser.h5",
"detectors": ["all"],
"format": ["elan"],
"output_dir": "/data/output/"
}'
CLI Reference
sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload
sltk formats
Data Types
from sltk.data import PoseSequence, Segment, SegmentList
# Load poses
poses = PoseSequence.load("video_wilor.h5", format="wilor", fps=25)
poses.data # (num_frames, num_keypoints, 3)
poses.fps # 25.0
poses.format # "wilor"
# Load ELAN annotations
from sltk.io import read_eaf, write_eaf
segments = read_eaf("annotations.eaf", tiers=["Gloss"])
# Create and export segments
new_segments = SegmentList([
Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="source.mp4")
Supported pose formats
| Format | Keypoints | Description |
|---|---|---|
| MediaPipe | 33 body + 21x2 hands + 468 face | Holistic pose estimation |
| WiLoR | 21 per hand | MANO hand model with rotation matrices |
| NLF/SMPL-X | 55 joints | Full body with axis-angle rotations |
All stored as HDF5 (.h5) files.
Configuration
| Variable | Description | Default |
|---|---|---|
SLTK_CORS_ORIGINS |
Allowed CORS origins | http://localhost:5173,http://localhost:3000 |
SLTK_ALLOWED_PATHS |
Filesystem whitelist for API | /vol/research,/home |
SLTK_WILOR_CHECKPOINT |
WiLoR model checkpoint | auto-resolved |
SLTK_WILOR_DETECTOR |
WiLoR hand detector | auto-resolved |
SLTK_SIGNREP_CHECKPOINT |
SignRep model checkpoint | auto-resolved |
SLTK_SEGMENTOR_CHECKPOINT |
Segmenter checkpoint | auto-resolved |
SLTK_NLF_MODEL_PATH |
NLF model path | — |
Testing
pytest # Full suite (1500+ tests)
pytest -m "not slow" # Skip slow tests
pytest -m api # API tests only
pytest --cov=sltk # With coverage report
License
CC-BY-NC-4.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signlangtk-0.1.1.tar.gz.
File metadata
- Download URL: signlangtk-0.1.1.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44dc559d7cb527b0999a05334a42bb9c5f50060234d1f5e684f4c738340f4b4a
|
|
| MD5 |
e07458f47b02e316f3e7bf6e85ec1330
|
|
| BLAKE2b-256 |
d1122691a3198e9577acd952783e37b5f822cb2f70e1a10f4b8d28c76752e18b
|
Provenance
The following attestation bundles were made for signlangtk-0.1.1.tar.gz:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.1.tar.gz -
Subject digest:
44dc559d7cb527b0999a05334a42bb9c5f50060234d1f5e684f4c738340f4b4a - Sigstore transparency entry: 1067034796
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@10f3ca0a6a5867efd3afd9da4346f40d4d3886c5 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@10f3ca0a6a5867efd3afd9da4346f40d4d3886c5 -
Trigger Event:
push
-
Statement type:
File details
Details for the file signlangtk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: signlangtk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ec7aa19b192531af65e61ca6e454dfbf90d9093cbbd2ac3d2d61d707a443840
|
|
| MD5 |
1a51a5952f0147223039de88d0fcdaa5
|
|
| BLAKE2b-256 |
d84662145f63443ebaae5e22a7bfa9cb81cb94a5289cb1dd2fb26e6685771a14
|
Provenance
The following attestation bundles were made for signlangtk-0.1.1-py3-none-any.whl:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.1-py3-none-any.whl -
Subject digest:
4ec7aa19b192531af65e61ca6e454dfbf90d9093cbbd2ac3d2d61d707a443840 - Sigstore transparency entry: 1067034844
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@10f3ca0a6a5867efd3afd9da4346f40d4d3886c5 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@10f3ca0a6a5867efd3afd9da4346f40d4d3886c5 -
Trigger Event:
push
-
Statement type: