Sign Language Toolkit for sign language research

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Sign Language Toolkit (SLTK)

A research toolkit for sign language video analysis. SLTK provides a complete pipeline from raw video to linguistic annotations: pose extraction, automatic segmentation, gloss spotting, and corpus analysis — all accessible via Python, CLI, or a web interface.

Installation

# Core library (data loading, ELAN I/O, CLI)
pip install signlangtk

# With web interface
pip install "signlangtk[api]"

# With GPU-accelerated pose extraction (WiLoR hand model)
pip install "signlangtk[wilor]"

# Everything
pip install "signlangtk[all]"

The PyPI package is called signlangtk, but the Python import is sltk:

import sltk
from sltk.data import PoseSequence, Segment

Requires Python 3.10+. For development from source:

git clone https://github.com/ed-fish/Sign-Language-Toolkit.git
cd Sign-Language-Toolkit
pip install -e ".[dev,api]"

The Processing Pipeline

SLTK's core workflow is a three-stage pipeline that turns raw sign language video into searchable, annotated ELAN files:

Video (.mp4)
    │
    ├── Stage 1: Pose Extraction ──► {stem}_wilor.h5
    │       WiLoR hand model: MANO rotation matrices + 3D keypoints
    │
    ├── Stage 2: Segmentation ──► {stem}_segments.eaf
    │       Transformer model predicts sign boundaries (BIO labels)
    │
    └── Stage 3: Spotting ──► {stem}_spotted.eaf
            SignRep model matches segments to a dictionary of known signs

Each stage can be run independently. If you already have H5 pose files, start at Stage 2. If you already have segment boundaries, start at Stage 3.

Stage 1: Pose Extraction

Extract hand poses from video using the WiLoR hand model. This produces an HDF5 file containing MANO rotation matrices and 21 3D keypoints per detected hand, per frame.

Python

from sltk.extraction.wilor import WiLoRExtractor

# Weights are auto-downloaded from HuggingFace Hub on first use
extractor = WiLoRExtractor()
extractor.load_model()  # downloads to ~/.cache/sltk/weights/ if needed
result = extractor.extract_from_video("video.mp4")
# Saves to video_wilor.h5
extractor.close()

API

# Start extraction job (runs in background on GPU)
curl -X POST http://localhost:8000/api/extraction/start \
  -H "Content-Type: application/json" \
  -d '{
    "video_path": "/data/video.mp4",
    "output_root": "/data/output",
    "config": {"enable_wilor": true, "device": "cuda"}
  }'
# Returns: {"job_id": "abc123", ...}

# Poll progress
curl http://localhost:8000/api/extraction/status/abc123

Output format

The WiLoR H5 file has this structure:

video_wilor.h5
├── attrs: fps, num_frames, resolution, extractor
├── frame_idx      (num_frames, 2)           # (start_idx, count) per frame
├── kpts_3d        (num_detections, 21, 3)   # 3D hand keypoints
├── right          (num_detections,)          # True = right hand
└── mano/
    ├── hand_pose      (num_detections, 15, 3, 3)   # joint rotations
    └── global_orient  (num_detections, 1, 3, 3)     # wrist rotation

Model weights

All model weights are automatically downloaded from HuggingFace Hub on first use and cached at ~/.cache/sltk/weights/. No manual download needed.

Resolution order (first match wins):

Explicit path in config (e.g. WiLoRConfig(checkpoint_path="..."))
Environment variable (e.g. SLTK_WILOR_CHECKPOINT)
Bundled at sltk/weights/
HF Hub cache at ~/.cache/sltk/weights/
Auto-download from HuggingFace Hub

To override the cache location, set SLTK_WEIGHTS_DIR.

Other extractors (MediaPipe, NLF/SMPL-X, TEASER, RTMPose) are also available — see sltk/extraction/.

Stage 2: Segmentation

The segmenter is a 4-layer Transformer that reads WiLoR H5 files and predicts per-frame BIO labels (0=OUT, 1=IN_SIGN, 2=BEGIN), identifying where individual signs start and end.

Python — high level

from sltk.segmentation.runner import segment_h5
from sltk.segmentation.output import OutputFormat

# Segment a single file → ELAN output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.eaf",
    output_format=OutputFormat.ELAN,
    fps=25.0,
    media_path="video.mp4",  # links the video in the EAF
)

# Segment a single file → JSON output
segment_h5(
    "video_wilor.h5",
    output_path="video_segments.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)

# Segment an entire directory
segment_h5(
    "/data/poses/",
    output_path="/data/segments/output.json",
    output_format=OutputFormat.JSON,
    fps=25.0,
)

Python — low level

from sltk.segmentation.runner import get_runner
from sltk.segmentation.h5_loader import h5_to_features
from sltk.segmentation.postprocess import extract_segments

# Load H5 → 192-dim feature vectors (MANO rotations as axis-angle)
features = h5_to_features("video_wilor.h5")  # shape: (num_frames, 192)

# Run the Transformer model
runner = get_runner()  # singleton, loads checkpoint once
labels = runner.predict(features)  # shape: (num_frames,) values 0/1/2

# Extract segment boundaries
segments = extract_segments(labels)
# [(12, 45), (50, 82), (90, 120), ...]

API

# Segment a single file
curl -X POST http://localhost:8000/api/segmentation/segment \
  -H "Content-Type: application/json" \
  -d '{"h5_path": "/data/video_wilor.h5", "fps": 25.0}'

# Batch segment a directory
curl -X POST http://localhost:8000/api/segmentation/segment/batch \
  -H "Content-Type: application/json" \
  -d '{
    "directory": "/data/poses/",
    "fps": 25.0,
    "output_path": "/data/segments/",
    "output_format": "json"
  }'

JSON output

{
  "video_name": {
    "fps": 25.0,
    "num_frames": 3000,
    "segments": [
      {"start_frame": 12, "end_frame": 45, "start_sec": 0.48, "end_sec": 1.80},
      {"start_frame": 50, "end_frame": 82, "start_sec": 2.00, "end_sec": 3.28}
    ]
  }
}

ELAN output

Creates a tier named {video_name}_segmentation with each segment labelled SIGN, authored by segmenter_v2.

Model checkpoint

Auto-downloaded from HuggingFace Hub on first use. Override with SLTK_SEGMENTOR_CHECKPOINT env var if needed.

Stage 3: Gloss Spotting

The spotter uses SignRep (a ViT-based model) to extract 768-dim visual features from 16-frame sliding windows, then matches each detected segment against a dictionary of known sign features using cosine similarity.

Prerequisites

Segment boundaries from Stage 2 (or your own)
A dictionary: a folder of .npz files (one per sign), each containing a best_latent key with a 768-dim feature vector

Python — full pipeline

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()

# 1. Extract dense features from the full video
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features shape: (num_windows, 768), L2-normalized

# 2. Load dictionary
dictionary = pipeline.load_dictionary(
    ["/data/dictionaries/bsldict/signrep/"],
    feature_key="best_latent",
)

# 3. Define segments (from Stage 2, or load from JSON/EAF)
segments = [
    {"segment_id": 0, "start_frame": 12, "end_frame": 45},
    {"segment_id": 1, "start_frame": 50, "end_frame": 82},
]

# 4. Match each segment against the dictionary
result = pipeline.spot(
    features=continuous,
    segments=segments,
    dictionary=dictionary,
    top_k=10,
    segment_pooling="max",  # or "mean", "softmax_weighted"
)

# 5. Inspect results
for seg in result.segments:
    print(f"Segment {seg.start_ms}ms–{seg.end_ms}ms:")
    for gl in seg.top_glosses:
        print(f"  Rank {gl['rank']}: {gl['gloss']} ({gl['similarity']:.3f})")

# 6. Save as ELAN (creates Rank-1..N and Score-1..N tiers)
result.save_eaf("video_spotted.eaf", media_path="video.mp4")

Python — one-shot

result = pipeline.spot_from_video(
    video_path="video.mp4",
    segments_json="video_segments.json",
    dictionary_dirs=["/data/dictionaries/bsldict/signrep/"],
    top_k=20,
    stride=4,
)

API

# Extract continuous features (cached server-side)
curl -X POST http://localhost:8000/api/signrep/continuous/extract \
  -H "Content-Type: application/json" \
  -d '{"video_path": "/data/video.mp4", "stride": 4}'
# Returns: {"features_id": "abc123", "num_windows": 500, ...}

# Spot glosses
curl -X POST http://localhost:8000/api/signrep/spot \
  -H "Content-Type: application/json" \
  -d '{
    "features_id": "abc123",
    "segments": [{"segment_id": 0, "start_frame": 12, "end_frame": 45}],
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 10
  }'

Building a dictionary

Before spotting, you need a dictionary. Extract one feature per isolated sign video:

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
result = pipeline.extract_dictionary("isolated_sign.mp4", method="middle")
result.save_npz("dictionary/HELLO.npz")

Batch extraction via the API:

curl -X POST http://localhost:8000/api/signrep/dictionary/batch/job \
  -H "Content-Type: application/json" \
  -d '{
    "video_dir": "/data/isolated_signs/",
    "output_dir": "/data/dictionary/",
    "method": "middle"
  }'

Model checkpoint

Auto-downloaded from HuggingFace Hub on first use. Override with SLTK_SIGNREP_CHECKPOINT env var if needed.

End-to-End Processing

The processing API combines Stages 2 and 3 into a single background job. It expects WiLoR H5 files to already exist alongside the videos ({stem}_wilor.h5).

API

# Segmentation only
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4", "/data/video2.mp4"],
    "type": "segments",
    "fps": 25.0
  }'

# Segmentation + spotting
curl -X POST http://localhost:8000/api/processing/submit \
  -H "Content-Type: application/json" \
  -d '{
    "video_paths": ["/data/video1.mp4"],
    "type": "spots",
    "dictionary_dirs": ["/data/dictionaries/bsldict/signrep/"],
    "top_k": 5,
    "fps": 25.0,
    "workspace": "my_workspace"
  }'

# Poll job
curl http://localhost:8000/api/processing/status/{job_id}

# Download result
curl -O http://localhost:8000/api/processing/output/{job_id}/video1_spotted.eaf

Output files

Job type	Output	Description
`segments`	`{stem}_segments.eaf`	Sign boundaries (SIGN labels)
`spots`	`{stem}_segments.eaf` + `{stem}_spotted.eaf`	Boundaries + ranked gloss labels

When a workspace is specified, output EAF files are automatically ingested into the corpus database for search and analysis.

Feature Extraction

SLTK computes several feature representations from H5 pose data, used by the segmenter and available for your own research.

WiLoR segmenter features (192-dim)

Used by the Transformer segmenter (Stage 2). Converts MANO rotation matrices to axis-angle, concatenates left and right hand:

from sltk.segmentation.h5_loader import h5_to_features

features = h5_to_features("video_wilor.h5")
# shape: (num_frames, 192)
# = 2 hands × 96 dims (16 joints × 6 axis-angle params)

Angle features (104-dim)

Body joint angles and hand Euler angles from MANO rotation matrices:

from sltk.processing.features import compute_angle_features

angles = compute_angle_features(body_poses, left_hand_poses, right_hand_poses)
# shape: (num_frames, 104)
# = 22 body angles + 41 left hand + 41 right hand

HaMeR features (288-dim)

Flattened MANO rotation matrices:

from sltk.processing.features import load_features_from_h5

angles, hamer = load_features_from_h5("video_mediapipe.h5", "video_wilor.h5")
# angles: (T, 104)
# hamer:  (T, 288) = 2 × (135 hand_pose + 9 global_orient)

SignRep embeddings (768-dim)

Dense visual features from the SignRep ViT model:

from sltk.embedding.pipeline import SignRepPipeline

pipeline = SignRepPipeline()
continuous = pipeline.extract_continuous("video.mp4", stride=4)
# continuous.features: (num_windows, 768), L2-normalized
# 16-frame windows at stride 4

Running the Web Interface

SLTK includes a React frontend for browsing workspaces, running processing jobs, and exploring corpus data.

Development mode

# Start FastAPI backend (port 8000) + Vite dev server (port 5173)
bash scripts/run_dev.sh

This launches both servers. The frontend is available at http://localhost:5173 and proxies API requests to the backend. Interactive API docs are at http://localhost:8000/docs.

Production mode

# Build the frontend
cd frontend && npm ci && npm run build && cd ..

# Serve everything from FastAPI
sltk serve --host 0.0.0.0 --port 8000

The built frontend is served as static files from FastAPI at http://localhost:8000.

Backend only

uvicorn sltk.api.main:app --host 0.0.0.0 --port 8000

Frontend pages

Route	Page	Purpose
`/`	Workspaces	Create/switch workspaces, scan directories
`/process`	Process	Submit segmentation and spotting jobs
`/explore`	Explore	Search glosses, view video clips, corpus statistics
`/viewer`	Viewer	Video playback with annotation overlay
`/analysis/*`	Analysis	Vocabulary, concordance, n-grams, collocations, durations

NMS Detection (Non-Manual Signals)

Detect blinks, head nods, shakes, tilts, and other non-manual signals from TEASER/FLAME face-tracking H5 files.

Python

from sltk.nms.runner import detect_nms, export_results

# Detect all NMS events
blinks, nms_events, quality = detect_nms(
    "video_teaser.h5",
    detectors={"all"},  # or {"blink", "nod", "shake", "tilt", "mouth", "eyebrow"}
)

# Export to ELAN
export_results(blinks, nms_events, "video_teaser.h5",
    output_dir="output/",
    formats=["elan", "json", "csv"],
    participant_id="P01",
)

Available detectors

Detector	Tier	Signal
`blink`	`BLINK`	Eye closures from eyelid parameters
`nod`	`HEAD-NOD`	Vertical head oscillation (pitch)
`shake`	`HEAD-SHAKE`	Horizontal head oscillation (yaw)
`tilt`	`HEAD-TILT`	Side-to-side head tilt (roll)
`mouth`	`MOUTH`	Lip/mouth movement from FLAME expression
`eyebrow`	`EYEBROW`	Eyebrow raise/furrow from FLAME expression
`gaze`	`EYE-GAZE`	Gaze direction (requires NLF/SMPL file)
`squint`	`EYE-SQUINT`	Partial eye closure

API

curl -X POST http://localhost:8000/api/nms/detect \
  -H "Content-Type: application/json" \
  -d '{
    "h5_path": "/data/video_teaser.h5",
    "detectors": ["all"],
    "format": ["elan"],
    "output_dir": "/data/output/"
  }'

CLI Reference

sltk convert input.npy output.h5 --from mediapipe --to wilor --fps 25
sltk evaluate predictions.txt references.txt --task translation
sltk to-elan segments.json --video source.mp4 --output annotations.eaf
sltk from-elan annotations.eaf --output segments.json --tier Gloss
sltk info video_wilor.h5
sltk serve --host 0.0.0.0 --port 8000 --reload
sltk formats

Data Types

from sltk.data import PoseSequence, Segment, SegmentList

# Load poses
poses = PoseSequence.load("video_wilor.h5", format="wilor", fps=25)
poses.data     # (num_frames, num_keypoints, 3)
poses.fps      # 25.0
poses.format   # "wilor"

# Load ELAN annotations
from sltk.io import read_eaf, write_eaf
segments = read_eaf("annotations.eaf", tiers=["Gloss"])

# Create and export segments
new_segments = SegmentList([
    Segment(start=0.0, end=1.5, label="HELLO", tier="Gloss"),
    Segment(start=1.5, end=3.0, label="WORLD", tier="Gloss"),
])
write_eaf(new_segments, "output.eaf", video_path="source.mp4")

Supported pose formats

Format	Keypoints	Description
MediaPipe	33 body + 21x2 hands + 468 face	Holistic pose estimation
WiLoR	21 per hand	MANO hand model with rotation matrices
NLF/SMPL-X	55 joints	Full body with axis-angle rotations

All stored as HDF5 (.h5) files.

Configuration

Variable	Description	Default
`SLTK_CORS_ORIGINS`	Allowed CORS origins	`http://localhost:5173,http://localhost:3000`
`SLTK_ALLOWED_PATHS`	Filesystem whitelist for API	`/vol/research,/home`
`SLTK_WILOR_CHECKPOINT`	WiLoR model checkpoint	auto-resolved
`SLTK_WILOR_DETECTOR`	WiLoR hand detector	auto-resolved
`SLTK_SIGNREP_CHECKPOINT`	SignRep model checkpoint	auto-resolved
`SLTK_SEGMENTOR_CHECKPOINT`	Segmenter checkpoint	auto-resolved
`SLTK_NLF_MODEL_PATH`	NLF model path	—

Testing

pytest                       # Full suite (1500+ tests)
pytest -m "not slow"         # Skip slow tests
pytest -m api                # API tests only
pytest --cov=sltk            # With coverage report

License

CC-BY-NC-4.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

cogvis-cvssp ed_fish

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Jun 1, 2026

0.1.12

Mar 19, 2026

0.1.11

Mar 19, 2026

0.1.9

Mar 10, 2026

0.1.8

Mar 10, 2026

0.1.7

Mar 10, 2026

0.1.6

Mar 10, 2026

0.1.5

Mar 10, 2026

0.1.4

Mar 10, 2026

This version

0.1.3

Mar 9, 2026

0.1.1

Mar 9, 2026

0.1.0

Mar 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

signlangtk-0.1.3.tar.gz (1.2 MB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

signlangtk-0.1.3-py3-none-any.whl (1.2 MB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file signlangtk-0.1.3.tar.gz.

File metadata

Download URL: signlangtk-0.1.3.tar.gz
Upload date: Mar 9, 2026
Size: 1.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.3.tar.gz
Algorithm	Hash digest
SHA256	`d57ae280fe039902580f48c6627161b1887b3e7ec89af54dba358602ebaf2a6c`
MD5	`c7993c9d4f7c82997a62b81f4c152273`
BLAKE2b-256	`03b0d5adf6ef4facab56bb80f450d003bf03107f09f816d725a4c544da3a7d7d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.3.tar.gz:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signlangtk-0.1.3.tar.gz
- Subject digest: d57ae280fe039902580f48c6627161b1887b3e7ec89af54dba358602ebaf2a6c
- Sigstore transparency entry: 1067306965
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: ed-fish/Sign-Language-Toolkit@6f90262207d84eba34361de9b71c7c33001fad71
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/ed-fish
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f90262207d84eba34361de9b71c7c33001fad71
- Trigger Event: push

File details

Details for the file signlangtk-0.1.3-py3-none-any.whl.

File metadata

Download URL: signlangtk-0.1.3-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for signlangtk-0.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f1628a19b1b0271480690b7bcecffa92bafe80c6e638503caf64ea63c5505c53`
MD5	`49f2c007082dc89da56206cfc6e3cae1`
BLAKE2b-256	`3937b8f15fbce16043dfb1281c331edde68caa350cedf3cd1a0c6a175ac55438`

See more details on using hashes here.

Provenance

The following attestation bundles were made for signlangtk-0.1.3-py3-none-any.whl:

Publisher: publish.yml on ed-fish/Sign-Language-Toolkit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: signlangtk-0.1.3-py3-none-any.whl
- Subject digest: f1628a19b1b0271480690b7bcecffa92bafe80c6e638503caf64ea63c5505c53
- Sigstore transparency entry: 1067307026
- Sigstore integration time: Mar 9, 2026
Source repository:
- Permalink: ed-fish/Sign-Language-Toolkit@6f90262207d84eba34361de9b71c7c33001fad71
- Branch / Tag: refs/tags/v0.1.3
- Owner: https://github.com/ed-fish
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6f90262207d84eba34361de9b71c7c33001fad71
- Trigger Event: push

signlangtk 0.1.3

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Sign Language Toolkit (SLTK)

Installation

The Processing Pipeline

Stage 1: Pose Extraction

Python

API

Output format

Model weights

Stage 2: Segmentation

Python — high level

Python — low level

API

JSON output

ELAN output

Model checkpoint

Stage 3: Gloss Spotting

Prerequisites

Python — full pipeline

Python — one-shot

API

Building a dictionary

Model checkpoint

End-to-End Processing

API

Output files

Feature Extraction

WiLoR segmenter features (192-dim)

Angle features (104-dim)

HaMeR features (288-dim)

SignRep embeddings (768-dim)

Running the Web Interface

Development mode

Production mode

Backend only

Frontend pages

NMS Detection (Non-Manual Signals)

Python

Available detectors

API

CLI Reference

Data Types

Supported pose formats

Configuration

Testing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance