Sign Language Toolkit for sign language research
Project description
Sign Language Toolkit (SLTK)
A research toolkit for sign language video analysis: workspace management, pose extraction, automatic segmentation, ELAN annotation editing, and a REST API serving a React annotation workstation.
Installation
# Core (data loading, formats, ELAN I/O)
pip install -e .
# With pose extraction
pip install -e ".[mediapipe]"
# With web API + frontend
pip install -e ".[api]"
# Everything
pip install -e ".[all]"
# Development (includes pytest, black, ruff, mypy)
pip install -e ".[dev]"
Requires Python 3.10+.
Quick Start
Running the API
# Start FastAPI backend (port 8000) + Vite frontend (port 5173)
bash scripts/run_dev.sh
# Or run the backend only
uvicorn sltk.api.main:app --host 0.0.0.0 --port 8000
Interactive docs at http://localhost:8000/docs (Swagger UI).
Python Library
from sltk.data import PoseSequence, Segment, SegmentList
from sltk.io import read_eaf, write_eaf
# Load poses from H5 file
poses = PoseSequence.load("video_wilor.h5", format="wilor", fps=25)
# Load ELAN annotations
segments = read_eaf("annotations.eaf", tiers=["Gloss"])
# Create segments and export to ELAN
new_segments = SegmentList([
Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="source.mp4")
CLI
sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
Processing Pipeline
SLTK provides a three-stage pipeline for processing sign language videos: pose extraction (video → H5), segmentation (H5 → sign boundaries), and spotting (segments → gloss labels). Each stage can be run independently.
Overview
Video (.mp4)
│
├─► 1. Pose Extraction ──► {stem}_wilor.h5
│ (WiLoR hand model: MANO params, 3D keypoints)
│
└─► 2. Segmentation ──► {stem}_segments.eaf / .json
│ (Transformer BIO labelling: OUT/IN/BEGIN)
│
└─► 3. Spotting ──► {stem}_spotted.eaf
(SignRep: match segments to dictionary glosses)
Stage 1: Pose Extraction (Video → H5)
Extract hand poses from video using WiLoR. This produces an H5 file containing MANO rotation matrices and 3D keypoints per frame.
Python:
from sltk.extraction.wilor import WiLoRExtractor, WiLoRConfig
config = WiLoRConfig(
checkpoint_path="path/to/wilor_final.ckpt",
detector_path="path/to/detector.pt",
)
extractor = WiLoRExtractor(config)
extractor.load_model()
result = extractor.extract_from_video("video.mp4")
# Saves to video_wilor.h5
API:
# Start extraction job (runs in background)
curl -X POST http://localhost:8000/api/extraction/start \
-H "Content-Type: application/json" \
-d '{
"video_path": "/data/video.mp4",
"output_root": "/data/output",
"config": {"enable_wilor": true, "device": "cuda"}
}'
# Poll status
curl http://localhost:8000/api/extraction/status/{job_id}
H5 file structure (WiLoR):
video_wilor.h5
attrs: fps, num_frames, resolution, extractor
frame_idx: (num_frames, 2) # (start_idx, count) per frame
kpts_3d: (num_detections, 21, 3)
right: (num_detections,) # True = right hand
mano/
hand_pose: (num_detections, 15, 3, 3) # rotation matrices
global_orient:(num_detections, 1, 3, 3)
Weight resolution — model checkpoints are found in this priority order:
- Explicit path argument
- Environment variable (
SLTK_WILOR_CHECKPOINT,SLTK_WILOR_DETECTOR) - Bundled at
sltk/weights/wilor/
MediaPipe and NLF extractors are also available for body/face poses — see sltk/extraction/.
Stage 2: Segmentation (H5 → Segments)
The segmenter v2 is a Transformer that reads WiLoR H5 files and predicts per-frame BIO labels (0=OUT, 1=IN, 2=BEGIN), identifying where signs start and end.
If you already have H5 files, this is where you start.
Python:
from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat
# Segment a single H5 file → JSON output
segment_h5(
"video_wilor.h5",
output_path="video_segments.json",
output_format=OutputFormat.JSON,
fps=25.0,
)
# Segment a single H5 file → ELAN output
segment_h5(
"video_wilor.h5",
output_path="video_segments.eaf",
output_format=OutputFormat.ELAN,
fps=25.0,
media_path="video.mp4", # links video in the EAF file
)
# Segment an entire directory of H5 files
segment_h5(
"/data/poses/",
output_path="/data/segments/output.json",
output_format=OutputFormat.JSON,
fps=25.0,
)
Lower-level control:
from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments
# Load features from H5 (converts MANO rotations → 192-dim features)
features = h5_to_features("video_wilor.h5") # shape: (num_frames, 192)
# Get the inference runner (singleton, loads checkpoint once)
runner = get_runner()
# Predict BIO labels
labels = runner.predict(features) # shape: (num_frames,) with values 0/1/2
# Extract segment boundaries as (start_frame, end_frame) tuples
segments = extract_segments(labels)
# [(12, 45), (50, 82), (90, 120), ...]
API:
# Segment a single H5 file
curl -X POST http://localhost:8000/api/segmentation/segment \
-H "Content-Type: application/json" \
-d '{"h5_path": "/data/video_wilor.h5", "fps": 25.0}'
# Segment a directory (batch)
curl -X POST http://localhost:8000/api/segmentation/segment/batch \
-H "Content-Type: application/json" \
-d '{
"directory": "/data/poses/",
"fps": 25.0,
"output_path": "/data/segments/",
"output_format": "json"
}'
JSON output format:
{
"video_name": {
"fps": 25.0,
"num_frames": 3000,
"segments": [
{"start_frame": 12, "end_frame": 45, "start_sec": 0.48, "end_sec": 1.80},
{"start_frame": 50, "end_frame": 82, "start_sec": 2.00, "end_sec": 3.28}
]
}
}
ELAN output: creates a tier {video_name}_segmentation with each segment labelled "SIGN", authored by segmenter_v2.
Checkpoint resolution: set SLTK_SEGMENTOR_CHECKPOINT or place segmentor_v2.ckpt in sltk/weights/segmentor/.
Stage 3: Spotting (Segments → Gloss Labels)
Spotting uses SignRep to extract 768-dim visual features from video frames, then matches each detected segment against a dictionary of known sign features to produce ranked gloss predictions.
Prerequisites:
- A segmented video (from Stage 2) with known segment boundaries
- A dictionary of sign features —
.npzfiles with keybest_latent, one per sign, typically stored at/vol/research/SignFeaturePool/features2/{dataset}/{method}/
Python — full pipeline:
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
# Step 1: Extract dense features from the full video (sliding 16-frame windows)
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (L, 768) L2-normalized
# Step 2: Load dictionary features
dictionary = pipeline.load_dictionary(
["/data/dictionaries/bsldict/signrep/"],
feature_key="best_latent",
)
# Step 3: Define segments (from Stage 2 output, or load from JSON/EAF)
segments = [
{"segment_id": 0, "start_frame": 12, "end_frame": 45},
{"segment_id": 1, "start_frame": 50, "end_frame": 82},
]
# Step 4: Spot — match each segment against dictionary
result = pipeline.spot(
features=continuous,
segments=segments,
dictionary=dictionary,
top_k=10,
segment_pooling="max", # "max", "mean", or "softmax_weighted"
)
# Each spotted segment has ranked gloss matches
for seg in result.segments:
print(f"Segment {seg.start_ms}ms–{seg.end_ms}ms:")
for gl in seg.top_glosses:
print(f" Rank {gl['rank']}: {gl['gloss']} ({gl['similarity']:.3f})")
Python — one-shot from video:
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
result = pipeline.spot_from_video(
video_path="video.mp4",
segments_json="video_segments.json", # from Stage 2
dictionary_dirs=["/data/dictionaries/bsldict/signrep/"],
top_k=20,
stride=4,
)
Save results as ELAN:
from sltk.segmentation.output import save_spotted_elan
save_spotted_elan(
result,
output_path="video_spotted.eaf",
fps=25.0,
media_path="video.mp4",
)
This creates an EAF file with tiers Rank-1 through Rank-N (gloss labels) and Score-1 through Score-N (similarity scores), authored by signrep_spotter.
API:
# Extract continuous features (cached server-side for 30 min)
curl -X POST http://localhost:8000/api/signrep/continuous/extract \
-H "Content-Type: application/json" \
-d '{"video_path": "/data/video.mp4", "stride": 4}'
# Returns: {"features_id": "abc123", ...}
# Spot glosses using cached features
curl -X POST http://localhost:8000/api/signrep/spot \
-H "Content-Type: application/json" \
-d '{
"features_id": "abc123",
"segments": [{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
"dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
"top_k": 10
}'
Checkpoint: set SLTK_SIGNREP_CHECKPOINT or place ckpt.pt in sltk/weights/signrep/.
End-to-End: Processing API
The processing API combines segmentation and spotting into a single background job. It expects WiLoR H5 files to already exist alongside the videos.
API:
# Segmentation only
curl -X POST http://localhost:8000/api/processing/submit \
-H "Content-Type: application/json" \
-d '{
"video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
"type": "segments",
"fps": 25.0
}'
# Segmentation + spotting
curl -X POST http://localhost:8000/api/processing/submit \
-H "Content-Type: application/json" \
-d '{
"video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
"type": "spots",
"dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
"top_k": 5,
"fps": 25.0,
"workspace": "my_workspace"
}'
# Poll job status
curl http://localhost:8000/api/processing/status/{job_id}
# Download output EAF
curl -O http://localhost:8000/api/processing/output/{job_id}/video1_spotted.eaf
H5 file lookup: the processing API searches for {stem}_wilor.h5 next to the video, in a poses/ subdirectory, or in a {stem}/ subdirectory. If no H5 is found, the video is skipped.
Output files:
| Type | Output file | Description |
|---|---|---|
segments |
{stem}_segments.eaf |
Sign boundaries (SIGN labels) |
spots |
{stem}_segments.eaf + {stem}_spotted.eaf |
Boundaries + ranked gloss labels |
When a workspace is specified, output EAF files are auto-ingested into the corpus database.
Building a Dictionary
Before spotting, you need a dictionary of sign features. Extract one feature per isolated sign video:
from sltk.embedding.pipeline import SignRepPipeline
pipeline = SignRepPipeline()
# Single sign video → 768-dim feature
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")
Batch extraction via the API:
curl -X POST http://localhost:8000/api/signrep/dictionary/batch/job \
-H "Content-Type: application/json" \
-d '{
"video_dir": "/data/isolated_signs/",
"output_dir": "/data/dictionary/",
"method": "middle"
}'
Each .npz file is named after the gloss (e.g., HELLO.npz) and contains a best_latent key with the 768-dim feature vector.
Architecture
sltk/
├── api/ # FastAPI REST API (16 routers, 88+ endpoints)
│ ├── main.py # App init, middleware, router registration
│ ├── models.py # Pydantic request/response schemas
│ ├── routers/ # Route handlers (see API Reference below)
│ ├── dependencies.py # Path validation, security
│ └── security.py # Security headers, CORS
├── io/ # File I/O
│ ├── elan.py # ELAN .eaf read/write
│ ├── elan_roundtrip.py # XML-preserving ELAN editing (round-trip safe)
│ ├── h5.py # HDF5 pose data I/O
│ └── safe_write.py # Atomic file operations
├── extraction/ # Pose extraction (MediaPipe, WiLoR, NLF)
├── segmentation/ # Transformer-based sign segmentation
├── visualization/ # Skeleton overlay video generation
├── processing/ # Feature computation, normalization
├── analysis/ # Clustering, embeddings, statistics
├── data/ # Core types (PoseSequence, Segment, Sample)
│ └── datasets/ # Dataset loaders (BOBSL, How2Sign, BSLCP, etc.)
└── config.py # Configuration and environment
frontend/ # React/Vite annotation workstation
scripts/
└── run_dev.sh # Dev server launcher
API Reference
Base URL: http://localhost:8000
Workspace Management — /api/workspace
Multi-workspace system for organizing videos and annotation files. Persists to ~/.sltk/workspaces.json.
| Method | Endpoint | Description |
|---|---|---|
GET |
/list |
List all workspaces |
POST |
/create |
Create a new workspace |
POST |
/switch |
Switch active workspace |
POST |
/scan |
Scan directory for videos + ELAN files |
GET |
/status |
Current workspace status |
PUT |
/rename |
Rename workspace |
DELETE |
/clear |
Clear active workspace |
DELETE |
/{name} |
Delete workspace by name |
PATCH |
/match |
Override video-ELAN matching |
POST |
/rescan |
Rescan for new files |
Videos — /api/videos /api/video
| Method | Endpoint | Description |
|---|---|---|
POST |
/videos/discover |
Discover videos in directory (recursive) |
GET |
/videos/info |
Video metadata (fps, resolution, duration) |
GET |
/video/stream |
Stream video with HTTP Range support |
Audio — /api/audio
| Method | Endpoint | Description |
|---|---|---|
GET |
/waveform |
Extract waveform peaks. Params: path, samples (default 8000) |
Extraction — /api/extraction
Pose extraction jobs (MediaPipe, WiLoR, NLF/SMPL-X).
| Method | Endpoint | Description |
|---|---|---|
GET |
/status/{job_id} |
Poll extraction progress |
POST |
/cancel/{job_id} |
Cancel extraction |
GET |
/jobs |
List all extraction jobs |
GET |
/logs/{job_id} |
Job logs (last 1000 entries) |
Poses — /api/poses
| Method | Endpoint | Description |
|---|---|---|
GET |
/load |
Load pose data from H5 file |
GET |
/metadata |
Pose metadata (frames, keypoints, format) |
GET |
/frame |
Single frame pose data |
GET |
/statistics |
Pose statistics |
Visualization — /api/visualization
Skeleton overlay video generation from H5 pose data.
| Method | Endpoint | Description |
|---|---|---|
POST |
/generate |
Generate overlay. Body: {video_path, h5_path, viz_type} |
GET |
/status/{job_id} |
Poll generation progress |
GET |
/check |
Check if cached overlay exists |
viz_type: "mediapipe", "wilor", or "nlf"
Datasets — /api/datasets
| Method | Endpoint | Description |
|---|---|---|
POST |
/connect |
Register dataset connection |
GET |
/connections |
List connected datasets |
GET |
/list |
List available datasets |
GET |
/{name}/videos |
Videos in dataset |
DELETE |
/connections/{name} |
Remove connection |
Features — /api/features
| Method | Endpoint | Description |
|---|---|---|
POST |
/detect |
Detect features in video |
GET |
/scan |
Scan for feature files |
GET |
/datasets/{name}/features/summary |
Feature summary for dataset |
Analysis — /api/analysis
Research-oriented endpoints for vocabulary, statistics, and linguistic analysis.
| Method | Endpoint | Description |
|---|---|---|
GET |
/vocabulary |
Extract vocabulary from dataset |
POST |
/batch/statistics |
Batch statistical analysis |
POST |
/research/vocabulary-mapping |
Map glosses across datasets |
POST |
/research/compare-datasets |
Compare two datasets |
POST |
/research/find-gloss-examples |
Find gloss examples |
POST |
/linguistic/concordance |
Gloss concordance |
POST |
/linguistic/cooccurrence |
Co-occurrence analysis |
POST |
/linguistic/ngrams |
N-gram frequency |
POST |
/linguistic/duration-analysis |
Duration statistics |
Embeddings — /api/embeddings
| Method | Endpoint | Description |
|---|---|---|
GET |
/status/{dataset} |
Embedding generation status |
DELETE |
/cache/{dataset} |
Clear embeddings |
GET |
/signrep/status |
SignRep model status |
Linguistics — /api/linguistics
Inter-rater reliability and phonological analysis.
| Method | Endpoint | Description |
|---|---|---|
POST |
/reliability/kappa |
Cohen's kappa |
POST |
/reliability/krippendorff |
Krippendorff's alpha |
POST |
/reliability/boundary-agreement |
Boundary agreement |
POST |
/reliability/confusion-matrix |
Confusion matrix |
POST |
/phonological-form |
Extract phonological form |
POST |
/phonological-distance |
Phonological distance |
Jobs — /api/jobs
| Method | Endpoint | Description |
|---|---|---|
GET |
/status |
Job system status |
GET |
/gpu |
GPU status and memory |
GET |
/list |
List active jobs |
POST |
/{job_id}/cancel |
Cancel job |
Settings — /api/settings
| Method | Endpoint | Description |
|---|---|---|
GET |
/ |
Get app settings |
POST |
/ |
Update settings |
GET |
/info |
System info |
GET |
/system/weights |
Model weights info |
Middleware & Security
The API applies the following middleware (in order):
- GZip — compresses responses >500 bytes
- Security headers —
X-Frame-Options: DENY,X-Content-Type-Options: nosniff, CSP, Permissions-Policy - CORS — configurable via
SLTK_CORS_ORIGINSenv var (default:localhost:5173,localhost:3000) - Path validation — whitelist check against
SLTK_ALLOWED_PATHS, directory traversal prevention
Configuration
Environment variables (set in .env or shell):
| Variable | Description | Default |
|---|---|---|
SLTK_CORS_ORIGINS |
Allowed CORS origins (comma-separated) | http://localhost:5173,http://localhost:3000 |
SLTK_ALLOWED_PATHS |
Allowed filesystem paths for API access | /vol/research,/home |
SLTK_RESEARCH_DATA_ROOT |
Root for research data | /vol/research |
SLTK_DATASETS_ROOT |
Root for raw datasets | /vol/research/datasets |
SLTK_FEATURE_ROOT |
Root for extracted features | /vol/research/SignFeaturePool/features2 |
SLTK_NLF_MODEL_PATH |
Path to NLF model weights | — |
SLTK_WILOR_MODEL_PATH |
Path to WiLoR model weights | — |
SLTK_SIGNREP_CHECKPOINT |
Path to SignRep checkpoint | — |
SLTK_SEGMENTOR_PATH |
Path to segmentor checkpoint | — |
Supported Pose Formats
| Format | Joints | Description |
|---|---|---|
| MediaPipe | 33 body + 21x2 hands + 468 face | Holistic pose estimation |
| WiLoR | 21 per hand | MANO hand model with rotation matrices |
| NLF/SMPL-X | 55 joints | Full body with axis-angle rotations |
All stored as HDF5 (.h5) files.
Testing
# Run full suite (1429 tests)
pytest
# With coverage
pytest --cov=sltk --cov-report=html
# Specific markers
pytest -m api # API tests only
pytest -m "not slow" # Skip slow tests
pytest -m gpu # GPU tests only
CI
GitHub Actions runs on every push/PR to main:
- Lint: black + ruff
- Type check: mypy
- Tests: pytest across Python 3.10, 3.11, 3.12 (coverage threshold: 40%)
- Frontend: npm build
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file signlangtk-0.1.0.tar.gz.
File metadata
- Download URL: signlangtk-0.1.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a69212513350fa46692b5e39f706976def696ff1a7151486b11411f3b76ad98e
|
|
| MD5 |
7d3e6a5248b3dcf610ef4bb1cc016b08
|
|
| BLAKE2b-256 |
d000ff49195f5bcb9f5fe9abc9dc0cb286d3211961e330e050b7d26921160288
|
Provenance
The following attestation bundles were made for signlangtk-0.1.0.tar.gz:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.0.tar.gz -
Subject digest:
a69212513350fa46692b5e39f706976def696ff1a7151486b11411f3b76ad98e - Sigstore transparency entry: 1066889680
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@d002463ba70d0ef7c5bb3d83e420b5007436369a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d002463ba70d0ef7c5bb3d83e420b5007436369a -
Trigger Event:
push
-
Statement type:
File details
Details for the file signlangtk-0.1.0-py3-none-any.whl.
File metadata
- Download URL: signlangtk-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
782c122d9bf8c35eb3397e0fe2a2eadff9cd787b09519fddaea68cad7ae6e1c1
|
|
| MD5 |
2b799f01253e29f3ba9a2d529920ed3d
|
|
| BLAKE2b-256 |
8d165aa855ec6047c48c463855ef09458de6d25f49e0e3acab2d5fa81bc6dd5f
|
Provenance
The following attestation bundles were made for signlangtk-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on ed-fish/Sign-Language-Toolkit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
signlangtk-0.1.0-py3-none-any.whl -
Subject digest:
782c122d9bf8c35eb3397e0fe2a2eadff9cd787b09519fddaea68cad7ae6e1c1 - Sigstore transparency entry: 1066889739
- Sigstore integration time:
-
Permalink:
ed-fish/Sign-Language-Toolkit@d002463ba70d0ef7c5bb3d83e420b5007436369a -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/ed-fish
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d002463ba70d0ef7c5bb3d83e420b5007436369a -
Trigger Event:
push
-
Statement type: